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INTRODUCTION  TO  SOUPAC 
A  User's  Guide 


The  SOUPAC  System 

The  University  of  Illinois  SOUPAC  system  consists  of  (l)  a  library  of 
statistical  data  processing  programs  residing  in  the  University  of  Illinois 
Department  of  Computer  Science  IBM  36O  system,  (2)  programmed  procedures 
necessary  to  communicate  among  various  data  storage  devices,  the  data  pro- 
cessing programs,  and  SOUPAC  library  subroutines,  (3)  a  special  program 
called  the  syntax  interpreter  which  translates  the  instructions  on  SOUPAC 
program  cards  into  instructions  to  the  IBM  36O  computer. 

To  use  the  SOUPAC  system,  appropriate  commands  must  be  issued  to  the 
computer  by  means  of  a  program  card  deck.   If  the  user's  data  is  on  punched 
cards,  in  contrast  to  disk  or  tape,  then  these  data  cards  are  submitted  with 
the  program  deck. 

The  program  deck  is  made  up  of  (l)  Job  Control  Language  cards ,  usually 
referred  to  as  JCL  cards  and  sometimes  called  36O  system  cards,  which  tell 
the  computer  under  which  problem  specification  number  (PS  number)  the  pro- 
gram is  to  be  run  and  that  the  SOUPAC  system  is  to  be  activated,  and  which 
may  also  communicate  additional  information,  especially  with  respect  to 
special  peripheral  storage  device  requirements,  to  the  computer;  (2)  SOUPAC 
statement  cards  which,  with  a  few  exceptions,  contain  the  names  of  the 
SOUPAC  programs  being  used  for  data  processing  in  the  job  and  the  suboperations 
if  any,  particular  to  the  programs.   Some  of  the  statement  cards  require 
program  constants,  or  parameters ,  to  be  assigned  by  the  user  to  suit  his  own 
particular  analysis. 

Example  1  shows  a  typical  SOUPAC  job.  Each  line  of  Example  1  represents 
a  separate  punched  card.  Each  letter  or  character  on  a  given  line  represents 
a  separate  column  on  the  punched  card,  with  the  leftmost  letter  or  character 


iii 


representing  column  one.   The  following  representation  of  a  card  deck  will  be 

used  thoughout  this  manual. 

Example  1 . 

A  Complete  SOUP AC  Job 

/*ID   PS=92li|,DEPT=PSYCH,NAME=SMITH 

//   EXEC   SOUPAC 

//SYSIN  DD   * 

CORREL    (CARDS) (PRINT) (SI/PRINT) 

PRINC    (S1)(S2/P)(    )(100)(1). 

VARIM    (S2) (PRINT). 

ENDS 

DATA    (U)(UF2.0) 

1  U   2   0 
-6301 

2  7  8  3 
0  0-1  k 
2  3  2  8 
1562 
8911 
12  2  7 
7060 
k  7-6-2 

-1   U-2   0 
Q  3  h  6 
END# 
/* 

The  JCL  statements  are  easily  identified  by  either  /*  or  //  in  the  first 
two  card  columns.   In  fact,  any  card  which  has  /*  or  //  in  the  first  two  columns 
is  considered  to  be  a  JCL  card  whether  it  was  mean  to  be  one  or  not.   In  Ex- 
ample 1,  the  first  three  cards  and  the  last  card  of  the  deck  are  JCL  cards. 
All  remaining  cards  are  either  SOUPAC  statements  or  data.   The  next  three 
cards  initiate  the  SOUPAC  CORRELATIONS,  PRINCIPAL  AXIS,  and  VARIMAX  programs 
in  that  order.   The  quantities  in  parentheses  are  the  parameters  for  the  re- 
spective programs.   (The  collection  of  parameters  for  an  individual  program 
is  called  the  parameter  string  for  that  program.)   The  ENDS  card,  exactly  one 
of  which  is  always  present  in  a  SOUPAC  job,  signals  to  the  syntax  interpreter 
program  that  the  next  cards,  if  any,  comprise  a  data  deck.   The  DATA  card 
indicates  to  the  computer  how  the  data  is  punched  on  the  next  cards.   The 
END#  card  is  a  signal  that  the  end  of  the  data  deck  has  been  reached.   The 


last  card  is  the  end  of  file  card  which  signals  the  computer  that  the  end 
of  this  job  has  been  reached. 

In  the  program  in  Example  1,  as  the  reader  will  recognize  after  reading 
subsequent  sections,  the  output  from  the  CORRELATIONS  program  is  used  in- 
ternally and  directly  as  input  to  PRINCIPAL  AXIS,  and  similarly  the  output 
from  PRINCIPAL  AXIS  is  the  input  to  VARIMAX.   The  capability  of  running  a 
series  of  programs  in  sequence  in  this  way  in  one  SOUPAC  job  is  one  of  the 
great  powers  of  the  SOUPAC  system. 
II.   More  Information  About  Program  Cards 
A.   The  JCL  or  360  system  cards 

The  JCL  cards  are  needed  to  run  all  IBM  360  jobs,  not  just  SOUPAC  jobs, 
and  are  thus  necessarily  sometimes  very  complex.   However,  a  SOUPAC  job  can 
usually  be  run  with  the  four  cards  discussed  in  greater  detail  below  and 
illustrated  in  Example  1. 

1.  ID  card 

This  card  has  the  form 

/*ID    [  accounting  information  ] 

where  the  acounting  information  includes  the  PS  number  Identifying 
the  account  against  which  the  cost  of  the  computer  run  is  to  be 
charged,  the  department  code  associated  with  the  account,  the  user's 
name,  and  possibly  some  other  information.   This  is  always  the  first 
program  card. 

Since  ID  card  information  requirements  may  change  with  the  fre- 
quent changes  in  the  overall  IBM  360  system  configuration,  the  user 
is  urged  to  keep  up  to  date  on  these  requirements,  and  he  should 
address  any  questions  about  ID  card  makeup  to  the  SOUPAC  consultants. 

A  user  who  does  not  have  an  account  with  the  Computer  Services 
Office  will  need  to  apply  for  one. 

2.  EXEC  card 

This  card  has  the  form 

//   EXEC   SOUP 

or 
//  EXEC   SOUPAC 


where  the  user  designates  SOUP  to  invoke  5  intermediate  storage 
devices  and  SOUPAC  to  invoke  15  (See  page  2B. )   In  some  cases  it 
■will  be  necessary  to  punch  additional  information  on  the  EXEC 
card  (See  page  lA  . ) 

3.   SYSIN  card 

This  card  has  the  form 

//SYSIN  DD  * 

and  communicates  to  the  computer  that  the  cards  following  are 
SOUPAC  statement  cards  and  data  cards. 

h.      End  of  file  card 

This  card  has  just  /*  punched  in  colvanns  1-2.   It  signals 
the  end  of  the  complete  job. 

Additional  JCL  cards  are  needed  in  special  cases,  as  when  the  input  comes 
from  a  user-supplied  disk  or  tape  rather  than  cards.   A  user  who  is  not  an 
experienced  programmer  will  require  the  assistance  of  a  SOUPAC  consultant  in 
such  cases. 
B.   SOUPAC  statement  cards 

There  are  three  types  of  SOUPAC  statements:   the  program  parameter  and 
subparameter  cards  which  are  the  basic  SOUPAC  statement  cards,  $  control 
cards,  and  #  control  cards  and  prolog  cards. 

The  latter  two  types  of  rather  specialized  cards  are  discussed  in  the 
section  Options  to  a  SOUPAC  Job.   The  individual  program  writeups  in  thir 
manual  describe  how  the  particular  program  parameter  cards  are  to  be  punched, 
but  a  few  general  rules  will  be  indicated  here. 

1.  In  general,  program  parameter  and  subparameter  cards  begin  with  a 
three  character  code  (mnemonic)  which  is  usually  the  first  three 
letters  of  the  program  name  or  subparameter  option  name.   Note 

that  in  Example  1  the  parameter  cards  have  CORREL,  PRINC ,  and  VARIM. 
In  each  case,  all  that  is  required  is  COR,  PRI,  or  VAR;  however,  it 
is  often  helpful  in  remembering  what  prograjn  is  being  used,  if  one 
writes  out  more  of  the  program  name  than  just  the  first  three  letters. 

2.  Mnemonics  are  followed  by  parameters  to  the  program.   Parameters  for 
the  programs  vary  from  program  to  program;  parameters  to  all  programs 
are  defined  in  this  manual.   See  Table  1  for  a  list  of  parameter 
types  and  their  delimiters. 


3.   All  parameters  are  "set  off"  by  delimiters  with  different  parameter 
types  being  set  off  by  different  delimiters.   In  Example  1,  each 
program  parameter  is  enclosed  in  a  parenthesis  pair,  e.g.  (  ). 
The  CORREL  statement  in  Example  1  has  three  parameters,  the  PRINC 
statement  has  5  parameters,  and  the  VARIM  statement  has  2  param- 
eters. 

k.      A  period,  sometimes  called  the  terminal  delimiter,  must  always  be 
punched  after  the  last  parameter  used. 

5.  Blanks  may  be  used  freely  to  improve  readability,  and  you  may  punch 
out  to  column  80. 

6.  Since  all  parameters  are  clearly  separated  by  delimiters,  any 
comments  desired  may  be  put  before,  between  or  after  parameters. 
For  example,  the  following  is  a  valid  SOUPAC  control  statement: 

CORRELATIONS  OF  RELIGIOUS  ATTITUDE  SURVEY  WITH  INPUT  FROM  (CARDS) 
AND  OUTPUT  TO  (PRINT)  AND  (PRINT). 

and  is  equivalent  to: 

COR ( CARDS ) ( PRINT )( PRINT ) . 

Parameter  Types  and  Their  Delimiters 


Type 


Address 


Delimiters 
(  ) 


Integer  (also  called  fixed 
point  constant) 

Index  set  -  integer 

Real  (also  called  floating 
point  constant) 

Index  set  -  real 

Labels  and  character  strings 


(     ) 
(     ) 


Examples 
(SI)    (P) 

(1)    (20) 
(10,20) 

n.,20.  ,2.* 
"ALPHA" 


7.      Some   SOUPAC   programs   have   subparameter  options.      Common  examples   of 
programs  which  do   and  do   not  have   subparameter   statements   are: 


Subparameter    statements 

BALANOVA 
FREQUENCY 
MATRIX 

TRANSFOroiATIONS 
and  others 


No   subparameter   statements 

AUTOCORRELATIONS 

BISERIAL  CORRELATIONS 

CORRELATIONS 

OBLIMAX 

PRINCIPAL  AXIS 

RANK  ORDER 

STANDARD  SCORES 

STEPWISE  MULTIPLE  CORRELATION 

T-TEST 

VARIMAX 

and  others 


msms^ 


Programs  which  have  subparameter  statements  must  "be  terminated  "by  an 
ENDP  card;  programs  which  do  not  have  subparameter  statements  must 
not  be  terminated  by  an  ENDP  card. 

Example  1  shows  three  programs  which  must  not  have  ENDP  cards. 
Example  2  shows  two  programs  which  must  have  ENDP  cards.   Within  the 
TRMSFOmiA-TIONS  program,  in  Example  2,  we  have  used  three  subparam- 
eter statements  -  ADD,  DIVIDE,  and  OUTPUT.   Within  the  MATRIX  program 
in  Example  2,  we  have  used  two  subparameter  statements  -  MO'VE  and 
MULTIPLY.   Specific  descriptions  of  the  subparameter  statements  can 
be  found  in  the  respective  individual  program  descriptions 

Example  2. 

Sample  SOUPAC   Statements,   Subparameter   Statements  and  ENDP  Card 

TRA( CARDS). 

ADD(8)(9)(8). 

DIV(8)*2*(8). 

0UT(S1)(1,8). 

ENDP 

MAT. 

M0VE(CARDS(S2)). 

MUL(S1)(S2)(S3/P(F)). 

ENDP 

C.   The  Interaction  of  SOUPAC  Programs 

The  point  of  running  a  SOUPAC  job  is  to  input  a  set  of  numbers  and  re- 
ceive as  output  a  set  of  answers.   Since  each  research  problem  is  unique,  it  is 
usually  the  case  that  no  single  SOUPAC  program  will  do  the  complete  analysis 
required.   Example  1  is  a  classic  example  of  how  output  from  one  SOUPAC  program 
is  used  as  input  to  another  SOUPAC  program.   Data  input  and  output  to  programs 
is  in  the  form  of  data  matrices .   A  data  matrix  is  a  rectangular  array  of  data; 
the  terms  row  and  column  of  a  data  matrix  have  the  obvious  intuitive  meanings. 
Data  matrices,  when  used  as  input  or  output  to  a  program,  are  references  by 
use  of  an  addr ess,  one  of  the  parameter  types  listed  in  Table  1. 

By  reading  the  description  in  this  manual  of  the  CORRELATIONS  program,  we 
find  that  the  third  parameter  is  the  output  address  of  the  Pearson  product- 
moment  correlation  matrix.   By  reading  the  description  of  the  PRINCIPAL  AXIS 
program  we  find  that  the  first  parameter  is  the  input  address  and  the  second 


parameter  is  an  output  address.   Similarly,  in  the  VARIMAX  program,  the 
first  parameter  is  an  input  address  and  the  second  parameter  is  an  output 
address. 

We  see  in  Example  1  that  the  output  address  of  CORRELATIONS  is  used 
as  the  input  address  to  PRINCIPAL  AXIS  and  that  the  output  of  PRINCIPAL  AXIS 
is  used  as  input  to  VARIMAX.   In  this  manner  we  process  the  raw  data,  step  by 
step,  to  derive  PRINCIPAL  AXIS  factors  rotated  according  to  the  VARIMAX 
criterion. 

Notice  that  implicit  in  a  set  of  SOUPAC  statements  is  an  order  in  which 

calculations  are  performed.   In  Example  1,  the  CORRELATIONS  program  is  executed 

first,  then  PRINCIPAL  AXIS,  then  VARIMAX.   Understanding  this  principle  is 

essential  for  understanding  how  SOUPAC  performs  input  and  output  of  data 

matrices.   In  Example  1  we  saw  data  matrices  passed  from  program  to  program 

by  being  stored  on  intermediate  storage,  labeled  SI,  S2,  and  S3. 

Example  3a. 

Input  of  One  Card  Data  Deck 

MTRIX 

MOVE (CARDS) (Si). 

ENDP 

CORREL ( SI ) ( PRINT ) ( S3 /PRINT ) . 

STAI^DARD  SCORES  ( SI )  ( S2/PRINT )  ( PRINT )  ( 1 )  . 

Example  3b. 

Input  of  Two  Card  Data  Decks 

CORREL ( CARDS ) ( PRINT ) ( S3/PRINT ) . 

STANDARD  SC0RES(CARDS)(S2/PRINT) (PRINT) (1). 

Example  3  points  out  a  critical  detail  in  the  proper  use  of  addresses. 
In  Example  3a  one  card  deck  is  read  and  stored  as  a  data  matrix  on  SI  by  use 
of  the  MATRIX  MOVE  suboperation.   This  matrix  is  then  used  as  input  to  both 
CORRELATIONS  and  STANDARD  SCORES.   Notice  that  using  SI  as  input  to  CORRELA- 
TIONS does  not  destroy  the  contents  of  SI  and  that  the  data  input  to  STANDARD 


SCORES  is  the  same  as  though  CORRELATIONS  had  not  been  executed.  As  long  as 
an  address  is  used  only  for  input,  the  matrix  stored  at  that  address  remains 
unchanged. 

Example  3b  is  similar  to  Example  3a  in  that  the  input  address  to 
CORRELATIONS  and  STANDARD  SCORES  is  again  the  same;  however,  in  Example  3b 
this  address  is  now  CARDS,  not  SI. 

Using  CARDS  as  an  address  differs 
from  using  an  intermediate  storage  file  such  as  SI  or  S2  in  that  subsequent 
uses  of  CARDS  mean   "read  the  next  card  data  deck." 

Therefore,  Example  3b  will  read  two  data  card  decks,  the  first  one  being 
read  by  CORRELATIONS  and  the  second  one  being  read  by  STANDARD  SCORES. 
Consecut'Ve  uses  of  an  S-type  address  for  input  will  cause  the  same  data  to 
be  read  from  that  file  each  time;  consecutive  uses  of  CARDS  for  input  will 
cause  the  next  card  data  deck  to  be  read  each  time. 

For  a  more  complete  discussion  of  SOUPAC  addresses,  read  the  section, 
SOUPAC  Input/Output  and  Intermediate  Storage. 
D.   Placement  of  Data  Decks 

Whenever  CARDS  is  used  as  an  input  address,  a  data  deck  will  be  read. 
Each  data  deck  must  be  preceded  by  a  data  format  card  and  followed  by  an 
END^  card.   If  more  than  one  data  deck  is  required,  as  would  be  the  case  in 
Example  3b,  the  decks  are  placed  in  the  order  they  will  be  read  by  the  SOUPAC 
control  statements.   See  Example  h   for  a  SOUPAC  job  that  uses  more  than  one 
data  deck.   Recall  that  there  is  one  ENDS  card  in  a  SOUPAC  job  and  its  pri- 
>nary  function  is  to  separate  the  SOUPAC  statements  from  the  data  card  decks. 


Example  k . 

SOUPAC  Job  With  Two  Card  Data  Decks 

/*ID  PS=92li+  ,DEPT=PSYCH,NAME=SMITH 

//   EXEC   SOUPAC 

//SYSIN   DD  * 

CORREL  (CARDS)  (PRINT)  (S3/PRINT). 

STANDARD  SCORES  (CARDS)  (S2/PRINT)  (PRINT)  (l). 

ENDS 

DATA(10)(10F6.2) 

[data  deck  for  CORRELATIONS  Program] 

END# 
DATA(8)(8F10.3) 

[data  deck  for  STANDARD  SCORES  Program] 

END# 
/* 

E.   The  Data  Matrix 

It  is  common  practice  to  represent  statistical  data  as  a  rectangular  array 
of  numbers  called  a  data  matrix.   This  convention  is  quite  natural  since  data 
samples  frequently  are  observations  taken  over  sets  of  variables.   For  example, 
suppose  we  have  collected  data  on  twenty  people  by  asking  them  their  sex,  age, 
height,  weight,  and  income.   If  we  wrote  this  information  in  tabular  form  on  a 
piece  of  paper,  we  would  probably  have  six  columns;  one  column  for  the  subjects' 
names,  and  one  column  each  for  each  of  the  five  variables:   sex,  age,  weight, 
height  and  income.   There  would  be  twenty  entries  in  this  table,  one  for  each 
subject.   What  we  have  then  in  tabular  form  is  a  rectangular  representation  of 
our  data  sample  with  the  variables,  including  the  name,  as  columns,  and  the 
observations  as  rows. 

From  this  representation  it  is  a  short  step  to  punching  the  table  onto  cards, 
one  card  per  person.   If  there  are  more  variables  per  observation  than  will  fit 
on  one  card,  we  use  as  many  consecutive  cards  as  are  necessary.   To  make  the  job 
of  setting  up  a  data  matrix  for  reading  as  simple  as  possible,  the  SOUPAC  user 
should  always  follows  these  rules : 


1.  Begin  each  observation  on  a  new  card. 

2.  If  there  is  more  than  one  card  per  observation  do  not  "split" 
a  multi-column  variable  across  a  card  boundary.   For  example, 
if  income  takes  six  columns,  and  there  are     three  columns 
left  on  the  current  card,  go  to  a  new  card.   Do  not  put  the 
first  three  columns  of  income  on  the  end  of  the  current  card, 
and  the  last  three  columns  on  a  new  card. 

3.  Always  have  the  number  of  cards  per  observation  constant  with- 
in a  given  data  deck. 

h.  Always  punch  variables  in  identical  card  columns  for  each  ob- 
servation. Each  observation  of  a  given  data  deck  should  have 
the  same  card  and  card  column  organization. 


F.   The  Data  Format  Card  and  the  END#  Card 

Whenever  CARDS  is  used  as  an  input  address,  the  SOUPAC  input  routines  must 
l.'.ov  huw  the  input  data  matrix  is  represented  on  punched  cards.  In  particular, 
■  -"ity  must  know: 

1.  What  are  the  dimensions  (number  of  rows  and  columns)  of  the 
data  matrix  being  represented. 

2.  Which  card  columns  represent  which  variables. 

3.  How  many  cards  per  observation  (row)  are  there. 


'^he   function  of  the  data  format  card  and  the  END#  card  is  to  provide  precisely 
.''.i.";  information. 

Data  format  cards  have  the  following  form: 
""/'vr-i'' [number  of  variables])  ([format]) 

For  example.  Example  1  had  the  data  format  card: 
..y.TMh)    (14F2.0) 

Notice  that  the  number  of  columns  (variables)  of  the  data  matrix 
must  be  explicitly  indicated  to  the  computer.   The  format  is  used  to 


describe  how  one  observation  is  punched.   All  observations  are  assiomed  to 
have  the  same  card  column  structure  so  that  the  computer  is  able  to  deter- 
mine the  number  of  rows  of  the  data  matrix  by  simply  reading  cards  under 
control  of  the  format  provided  until  it  finds  the  END//  card.   Since  the 
format  defines  one  observation,  implicit  in  this  definition  is  how  many 
cards  per  observation  are  used. 

The  format  is  a  FORTRAN  type  format  and  follows  all  the 

rules  of  FORTRM  with  the  added  restriction  that     it     may  not  be  longer 
than  592  characters.   If  the  format  itself  does  not  fit  on  one  card,  one  simply 
continues  punching  the  format  onto  the  next  card.   No  continuation  marks  are 
required  since  the  computer  stops  reading  format  cards  when  it  finds  the 
right  parenthesis  which  closes  the  format  string. 

It  is  possible  to  write  very  complex  formats,  and  the  user  who  wishes  to 
explore  such  possibilities  is  referred  to  any  standard  FORTRAN  language  text. 
For  most  uses,  however,  the  X  and  F  field  specifications,  discussed  in  detail 
below,  will  be  quite  adequate. 

It  is  conventional  in  computer  programming  to  refer  to  a  particular 
group  of  contiguous  columns  on  a  data  card  as  a  field.   Hence  a  four  digit 
number  will  occupy  a  field  of  at  least  four  columns  when  punched  on  a  data 
card,  and  the  field  will  be  larger  if  an  algebrai  c  sign  and/or  a  decimal 
point  is  also  punched.   The  field  specifications  in  a  format  tell  the  computer 
which  fields  are  assigned  to  the  variables  being  input  and  describes  how  the 
nixmbers  are  to  be  interpreted.   In  particular: 

1.  The  field  specification  nX,  where  n  is  an  integer,  tells  the  input 
routine  to  skip  n  columns.   These  n  colimins  are  said  to  be  a  skip  field. 

2.  The  field  specification  Fw.d,  where  w  and  d  are  integers,  and  w  >  d, 
specifies  that  a  field  of  w  columns  is  to  contain  a  decimal  niimber  and  that 
in  case  the  decimal  point  is  not  actually  punched  then  the  number  is  to  be 


divided  by  10  to  the  d  power  before  being  stored  in  the  computer.  However, 
if  tlie  decimal  point  is  actually  punched  then  it  overrides  the  d  indicator. 
(Remember  -  w  must  include  a  column  for  the  decimal  point!)   For  examiile,  the 

field  specification  T7 .2   means  that  the  corresponding  field  of  7  columns  v.-on- 

o 
tains  a  decimal  number  which  is  to  be  divided  by  10",  or  100,  unless  a 

decimal  point  is  punched. 

The  user  should  be  warned  that  any  blanks  in  a  field  read  by  F  field 
specification  are  read  as  zeroes!   Suppose  that  card  columns  6-10  are  read 
according  to  F5.1  and  572  is  punched  in  columns  6-8,  then  the  numb^-r  stored 
by  the  computer  is  57200/10  or  5720. 

The  field  specifications  must  be  separated  by  commas,  but  a  sequence  such 
as  F6.3,  F6.3,  T6.3,  F6,3  can  be  represented  by  ^f6.3,  which  saves  space  and 
i.iyj'Uiiehing  labor.   This  representation  is  used  in  Rxainpxe  1  where  DATA(i4) 
(1(F2.0)  is  punched  instead  of  DATA(U)  (F2.0,F2.0,F2.0,F2.0)  . 

Data  fields  on  the  data  cards  are  represented  left  to  right  in  the 
fi^niiat.  Example  'yi.    indicates  that  we  have  ten  variables  and  that  these  trn 
variables  are  punched  in  consecutive  five  column  fields  beginning  in  column 
sixteen.   Notice  that  we  have  defined  only  the  last  65  of  80  card  columns. 
Tht3   first    15  columns  will  simply  be  ignored. 

Tf  an  observation  goes  over  one  card,  use  the  /  (slash)  to  indicate  that 
the  data  fields  continue  on  the  next  card. 

Hint:   The  number  of  data  cards  considered  to  be  on  observation  is  the 
number  of  slashes  in  the  format  plus  one,  i.e.  no  slashes  implies  one  card 
per  observation,  one  slash  implies  two  cards  per  observation. 

With  minimum  practice,  format  reading  becomes  almost  automatic.   Example 
5b  can  be  read  as  follows: 


1.  There  are  twenty  variables  in  this  data  matrix. 

2.  a.   Beginning  in  card  column  one,  skip  five  columns. 

b.  Read  one  six  column  variable. 

c.  Skip  two  card  columns. 

d.  Read  one  three  column  variable. 

e.  Skip  four  card  columns. 

f.  Read  nine  variables  of  six  columns  each. 

g.  Go  to  a  second  card  (ignoring  the  last  six  col\mins  of  the 
current  card) . 

h.   Skip  the  first  twenty  colxomns  of  the  second  card.  . 
i.   Read  nine  variables  of  six  columns  each. 

Always  remember  that  there  are  80  columns  on  a  punched  card.   If  a  format 
specifies  a  card  which  has  more  than  80  card  columns,  an  error  will  result. 
Example  5c  shows  a  format  which  specifies  90  columns  for  the  first  card  of  a 
two-card  observation.   The  use  of  this  card  will  result  in  an  error. 

The  number  of  variables  should  agree  with  the  nimiber  of  field  specifica- 
tions, including  replications,  in  the  format.   Experienced  programmers  can 
sometimes  take  exception  to  this  rule  in  order  to  shorten  format  length,  but 
it  is  a  dangerous  practice,  and  the  discrepancy  between  these  two  counts  is 
one  of  the  more  frequent  errors  made  in  punching  the  data  format  card. 


Example  5a. 
Sample  Data  Format  Card 


DATA  (10)  (15X,10F5.0) 


Example  Sb . 
Data  Format  Card  Indicating  Two  Cards  Per  Observation 

DATA ( 20 ) ( 5X , F6 . 0 , 2X , F3 . 0 , UX , 9F6 . 0/ 20X , 9F6 . 0 ) 

Example  5c . 
Invalid  Data  Format  Card  -  More  Than  8o  Columns  Specified 

DATA(20)(10X,10F8.0) 

Example  5d . 
More  Variables  Specified  Than  Data  Fields 

DATA(21)(5X,F6.0,2X,F3.0,Ux,9F6.0/20X,9F6.0) 

Once  a  data  deck  has  been  read  into  SOUPAC ,  all  information  concerning 

which  card  columns  contained  which  variables  is  lost.   Within  SOUPAC  a  data 

sample  is  handled  as  a  data  matrix  and  variables  are  referenced  by  variable 

number,  not  by  card  column. 


G.   Keypunching  Hints 

1.  V/henever  possible  code  variables  using  numbers  instead  of  letters, 
^or  example,  code  sex  as  0,  1  rather  than  F,  M. 

2.  Avoid  confusing  0,  the  letter  oh,  and  0,  the  number  zero.   Oh  and 
zero  have  different  hole  codes  in  the  punched  card.   To  help  avoid  possible 
'^onfusion,  it  is  common  to  see  the  letter  written  as  0.   Unless  the  meaning 
of  0  is  clear  from  the  context  in  which  it  is  used,  assume  that  0  means  the 
letter  and  0  means  the  number  (Warning,  other  computer  installations  often 
'"iisagree  on  t>)is  convention  and  represent  zero  by  0) . 

3.  Code  missing  data  as  blanks  (no  punches).   Gome  researchers  code 
missing  data  as  9,  99»  ^i"  999  depending  upon  the  number  of  card  col^umns  which 
I  he  variable  uses.   This  method  has  the  disadvantage  that  once  the  data  cards 

been  read  the  variables  are  treated  as  matrix  columns,  and  missing  data 
■  are  not  coded  consistently  for  all  variables.   ^.Iso,  those  SOUPAC  pro- 
^-rains  which  correct  for  missing  data  presume  that  missing  data  has  been  coded 
ae  H  blank.   V/ithin  the  computer  a  blank  is  represented  as  -0  (minus  zero). 

h.      Punched  cards  which  are  warped  or  ragged-edged  are  likely  to  be  re- 
"lectf-d,  or  perhaps  misroad,  by  the  card  reader,  and  such  cards  should  be 
rep.iaced  before  they  cause  problems.   The  /*ID  card  in  front  of  the  deck 
nnd  the  /*  card  at  the  end  of  the  deck  are  especially  susceptible  to  rapid 

5.   Overpunching  is  the   attempt,  usually  unsuccessful,  to  punch  more 
■ '.f  ;,ririntion  in  One  colimn  -f  ?  card  than  is  normally  allowed.   The  result  is 

I'i.en  undefined  hole  combinations  which  the  card  reader  rejects  as  a  "read 
'•qcck."   Avoid  overpunching. 


Programming  errors  are  difficult  to  avoid  in  writing  even  simple  programs. 
..,  1"  ill  in  the  detecti-r  nnd  ci:~-:->'ecting  of  errors,  or  "debugging,"  is  an 
'J) '.rtant  facet  of  the  progranm.ing  art.   The  occasional  SOUPAC  user  usually 


does  not  acquire  the  experience  necessary  to  'become  adept  at  detecting  errors, 
but  there  are  a  number  of  things  that  he  or  she  can  do  to  minimize  them  and 
detect  the  more  obvious  ones. 

One  of  the  most  simple  and  important  error  saving  strategies  is  one 
that  is  seldom  used  -  avoid  rushing  the  job!         Several  days  should  be 
allowed  for  writing  and  debugging  short  programs  and  as  much  as  several  weeks 
for  complex  ones.   Even  when  debugging  the  program  is  a  minor  problem,  unex- 
pectedly long  turn-around  times  due  to  heavy  job  volumes  or  computer  break- 
downs can  foul  tight  schedules.   Long  range  planning  is  a  key  part  of  data 
processing . 

Programming  errors  can  be  broadly  categorized  into  two  types,  depending 
on  whether  they  disrupt  the  translation  of  the  program  into  machine  instruc- 
tions ( "compile-time"  errors)  or  whether  they  disrupt  the  computer's  actual 
execution  of  the  machine  instructions  ("execution-time"  errors).   Occurrence 
of  either  type  error  will  normally  cause  the  computer  to  print  an  appropriate 
error  message  for  diagnostic  purposes.   The  user  will  usually  find  that 
compile-time  errors,  which  are  most  often  caused  by  mispunching  or  deleting 
characters,  are  relatively  easy  to  find.   However,  execution-time  errors  are 
often  due  to  incorrectly  punching  data  or  using  an  incorrect  format  statement, 
resulting  in  unexpected  numbers  being  processed,  and  considerable  time  and 
ingenuity  may  be  needed  to  run  such  an  error  to  ground. 

Here  are  some  debugging  hints: 

1.  Obviously  the  program  deck  should  be  carefully  checked  for  keypunch- 
ing errors  before  it  is  submitted  to  be  run. 

2.  Make  sure  that  the  cards  are  in  the  correct  order. 

3.  Missing  ENDS  and  EWDP  cards  are  a  common  source  of  error.   Check 
for  these. 

h.      After  correcting  an  error  for  which  an  error  message  has  been  ex- 
plicitly generated  by  the  computer,  carefully  reexamine  the  entire  program. 


MtvsBim 


The  cc-nputer  may  not  detect  all  errors  on  the  same  run,  and  you  nay  find  one 
which  the  computer  has  not  yet  seen  hut  which  would  abort  the  next  run. 

5.  Make  sure  that  the  parameters  specified  on  the  /*ID  card  are  such 
that  the  Job  can  run  to  completion. 

6.  TAKE  YOUR  TIME! 


DATA  AND  MATRIX  MANIPULATIONS  PACKAGE 


The  data  manipulative  programs 


As  implied  by  the  name,  TRANSFORMATIONS  is  used  for  performing 
data  transformations  often  necessary  in  "setting  up"  data  to  be  input 
to  one  of  the  "cook-book"  programs  of  the  SOUPAC  system.   The  user  may 
create  new  variables  as  linear  combinations  or  as  algebraic  functions  of 
old  variables,  and  the  user  may  recede  or  alter  data  values  on  the  basis 
of  test  conditions  by  using  the  TRANSFORMATIONS  program.   To  perform 
matrix  algebra  operations  with  one  or  more  matrices,  to  augment  matrices  either 
row-wise  or  column-wise,  to  reorder,  save  or  delete  specific  rows  or 
columns  of  a  matrix,  one  would  use  the  MATRIX  program.   MATRIX  also  has 
the  capability  of  printing,  punching  card  decks,  and  reading  and  writing 
tape  and  disk  files  in  ways  not  available  elsewhere  in  SOUPAC. 

The  TRANSFORMATIONS  and  MATRIX  programs  have  a  unique  place  in  the 
SOUPAC  system.   Although  the  remainder  of  the  SOUPAC  library  performs 
a  large  number  of  specialized  statistical  procedures,  there  are  some 
computations  which  are  not  represented  by  a  uniquely  written  program. 
However,  by  an  imaginative  utilization  of  the  combined  powers  of 
TRANSFORMATIONS  and  MATRIX  it  is  possible  to  perform  a  virtually  unlimited 
range  of  established  and  experimental  statistical  techniques.   It  is  a 
common  practice  within  the  SOUPAC  office  to  "check  out"  newly  written 
programs  against  results  computed  by  TRANSFORMATIONS  and  MATRIX.   -Conversely, 
these  two  programs  can  be  used  as  teaching  tools  by  having  students  learn 
the  step-by-step  computations  and  then  checking  the  results  against  results 
of  the  "cook-book"  programs.   Complete  multiple  regressions  and  analysis 
of  variance  programs,  for  example,  have  been  written  in  this  manner.   Sinr>o 
most  SOUPAC  jobs  require  the  use  of  TRANSFORMATIONS  and  MATRIX,  a 
familiarity  with  these  two  programs  is  basic  to  an  effective  use  of  the 
SOUPAC  system. 


MATRIX 


I.   General  Description 

The  MATRIX  program  is  a  data  manipulating  program  for  inputting  and  out- 
putting,  creating,  performing  matrix  algebraic  operations  on,  and  generally- 
handling  data  matrices.   All  the  MATRIX  suboperations  are  restricted  to  1000 
columns  (variables).   No  absolute  limit  is  set  on  the  number  of  rows 
(observations). 

Standard  SOUPAC  address  conventions  are  used  including  the  use  of  the 
character  X  to  denote  punched  output,  and  (F)  after  a  print  to  denote  print 
with  F  format.   Also  available  and  discussed  in  section  III  below  is  the  use 
of  I  for  storing  a  matrix  in  memory,  and  the  use  of  (L)  after  a  print  to 
invoke  the  MATRIX  labeling  feature.   All  other  restrictions  are  noted  by  the 
discussion  of  the  individual  suboperation  explanations. 

;i.  Parameters 

A.   Main  Parameters 

The  MATRIX  program  is  invoked  by  coding  the  name  MATRIX,  or  simply  the 
mnemonic  MAT  on  a  program  card.   There  are  two  optional  parameters  avail- 
able which  may  be  coded  on  the  MAT  card. 

1.  If  it  is  desired  that  the  program  print,  immediately  prior  to  the 
execution  of  each  MATRIX  subparameter  operation,  the  time  in  seconds 
since  entry  into  the  MATRIX  program,  code  a  (l)  after  the  name  MATRIX. 
This  option  is  not  normally  needed  and  is  provided  merely  for  giving 
timing  estimates.   Example: 

MATRIX  (l). 

2.  The  second  optional  parameter  is  coded  as  a  (l)  following  the  timing 
estimate  parameter.   This  second  option  is  used  to  suppress  printing 
by  the  program  of  the  number  of  rows  and  columns  and  the  precision 
of  all  answers,  i.e.  output,  matrices.   Normally,  the  program  will 
always  print  out  this  information.   Example: 

MATRIX  0(1). 

If  both  options  one  and  two  are  desired  use: 

MATRIX  (1)(1). 

If  neither  option  is  desired,  as  is  generally  the  case,  use: 

MATRIX. 


Note  that  as  in  all  SOUPAC  programs,  the  main  program  statement,  and  also 
each  subprogram  statement,  must  be  terminated  by  a  period. 
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B .   Subpar amet  er  s 

Any  MATRIX  operation  may  be  invoked  by  coding  its  mnemonic  followed  by 
appropriate  subparameters .  All  operations  in  MATRIX  handle  both  single 
and  double  precision  matrices  at  the  control  of  the  user  (see  operations 
SINGLE  and  DOUBLE).   For  an  address  not  explicitly  assigned  either  single 
or  double  precision,  MATRIX  assumes  a  default  of  double  precision  for 
output  to  the  address.   Terminate  all  subparameter  statements  with  a 
period. 

To  end  a  MATRIX  program,  place  a  card  which  has  the  characters  END  P 
after  the  last  MATRIX  subparameter  card.   Since  all  MATRIX  programs  must 
have  at  least  one  subparameter  operation,  an  error  will  be  signaled  if  a 
MAT  card  is  followed  immediately  by  an  END  P  card. 

Input  and  output  for  I^IATRIX  may  be  from  any  source,  however,  the 
following  rules  must  be  observed: 

1)  Never  use  CARDS  as  input  to  any  operation  except  MOVE 
unless  both  the  number  of  rows  and  the  number  of  columns 
have  been  specified  on  the  DATA  format  card  at  the  front 
of  the  data  deck. 

2)  You  may  never  have  an  output  address  of  only  (PRINT)  or 
(P).   All  MATRIX  output  must  go  to  some  intermediate 
storage  location  even  when  only  printout  is  desired;  for 
example  (Sl/P). 

3)  Avoid  using  the  same  address  more  than  once  on  the  same 
parameter  card  unless  otherwise  noted  in  the  description 
of  an  individual  operation.   However,  in  those  operations 
which  do  permit  using  an  address  more  than  once  as  an 
input  address,  CARDS  may  not  be  used  as  an  input  address 
more  than  once.   In  all  operations  except  INVERT,  never 
specify  an  output  address  which  is  the  same  as  an  input 
address  for  that  operation. 

h)      The  contents  of  an  input  address  remains  unchanged  during 
the  execution  of  an  operation  unless  otherwise  noted. 


Following  is  a  description  of  the  subparameter  operations  currently  in 
the  MATRIX  program. 


Summary  of  MATRIX  Operations 
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mnemonic   not 

es 

operation  name 

examples 

ABS 

Absolute  value 

ABS 

;si 

)  (S2). 

ADD 

1,2 

,5 

Add 

ADD 

;si 

)  (S2)  (S3). 

ALL 

3 

,6 

All 

ALL 
ALL 
ALL 

;si 
;si 
;si 

)  "GT"  *0*  (S2). 

)  "NE"  *-0*  (S2)  (2). 

)  "LE"  (S2)  (S3)  (1,10,3). 

ANY 

3 

,6 

Any 

ANY 
ANY 
ANY 

;si 
;si 
;si 

)  "GE"  (S2)  (S3). 

)  "EQ"  *1.*  (S2)  (1,2)  (5) 

)  "LT"  *99*  (S2). 

CHO 

1+ 

Cholesky  decomposition 

CHO 

;si 

)  (S2). 

COL 

6 

Column  delete 

COL 
COL 

;si 
;si 

)  (S2)  (2). 

)  (S2)  (10)  (12,15). 

CON 

Constant  addition 

CON 
CON 

;si 

;si 

)  *2.*  (S2). 
)  (S2)  (S3). 

COU 

Count 

COU 
COU 
COU 

;si 
;si 
;si 

)  (S2). 

)  (S2)  (1). 

)  (S2)  (2). 

CRO 

Cross  product 

CRO 
CRO 

:si 
;si 

)  (S2). 

)  (S2)  (1). 

DIA 

Diagonal  to  vector 

DIA 

;si 

)  (S2). 

DIM 

Dimension 

DIM 

;si 

)  (S2). 

DOU 

2 

Double  precision 

DOU 

(S] 

-). 

EJE 

Eject 

EJE. 

E-D 

1,2 

,5 

Elementwise  divide 

E-D 

;si 

(S2)  (S3). 

E-M 

1,2 

,5 

Elementwise  multiply 

E-M 

;si 

)  (S2)  (S3). 

E-R 

Elementwise  square  root 

E-R 

;si 

)  (S2). 

EXP 

Expand 

EXP 
EXP 

:si 
;si 

)  *20*  (S2). 
)  (S2)  (S3). 

FIL 

2 

File 

FIL 

;si 

. 

GEN 

6 

Generate 

GEN 

;si 

1  *i*. 

HOR 

1,2 

,5 

Horizontal  augment 

HOR 

;si 

)  (S2)  (S3). 

IDE 

Identity  matrix 

IDE 

;^8 

)  (SI). 

INP 

Input 

INP 
INP 

INP 

;si 
:si 

'SI 

)  (S2)  (  )  (9)  "(9F6.1)". 
)  (S2)  (  )  (12). 
)  (I)  (576)  (10). 

INY 

h 

Invert 

INV  ( 
INV  ( 
INV  ( 

;si^ 
I) 

si; 

(S2). 

(I). 
(S2)  (1)  (1)  *10.E-6*. 

KRO 

Kronecker  product 

KRO  ( 

SI' 

(S2)  (S3). 

LAB 

6 

Label 

LAB 

.si' 

"SAMPLE  l"  "AGE"  "SEX". 

LAG 

Lag 

LAG  ( 
LAG  ( 

si; 
si; 

(2)  (6)  (  )  (S2). 

(3)  (M  (1)  (S2). 

LOW 

Lower  triangle 

LOW  ( 
LOW  ( 

si; 
si; 

(S2). 
(S2)  (1). 
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mnemonic 

notes 

operation  name 

examples 

5 

MAX 

Maximum  value 

MAX  ( 
MAX  ( 
MAX  ( 

'si; 
'si; 

'Sl^ 

(S2). 
(S2)  (1). 
>  (S2)  (2). 

MIN 

Minimum  value 

MIN  ( 
MIN  ( 
MIN  ( 

'Sl^ 

;si^ 

'Sl^ 

)  (S2). 

)  (S2)  (1). 

>  (S2)  (2). 

MOV 

2 

Move 

MOV  ( 
MOV  ( 

:c) 
:si 

(SI). 
)  (S2). 

MUL 

1,^,5 

Multiply 

MUL  ( 

'SI 

)  (S2)  (S3). 

OUT 

Output 

OUT  ( 
OUT  ( 

[SI 
'SI 

)  (S2)  "(10E16.9)". 
)  (S2). 

PAR 

Partition 

PAR 

;si 

)  (S2)  (5)  (10)  (2)  (21). 

PER 

6 

Permutation 

PER  1 

[SI 

)  (2)  (1). 

PRI 

Print 

PRI 

[SI 

)  "('  •,10F13.^)". 

PUN 

Punch 

PUN 

[SI 

)  "(8F10.3)". 

REG 

Reciprocal 

REC 

[si 

)  (S2). 

REM 

Remap 

REM 

[si 

)  (S2)  (8). 

REW 

2 

Rewind 

REW 

'SI 

1   • 

ROW 
RSD 

6 

Row  delete 

Reciprocal  of  Square 
of  Diagonal 

ROW 

Root  ^^^^ 
RSD  ( 

[SI 

[si 

SI' 

)  (S2)  (1)  (2)  (3). 

)  (S2)  (2,20,3)  (3,20,3). 

)(S3). 

SCA 

Scalar  multiply- 

SCA  ( 
SCA  ( 

>S1, 
'SI' 

)  *.l*  (S2). 
)  (S2)  (S3). 

SIN 

2 

Single  precision 

SIN  ( 

SI' 

• 

SUB 

1,2,5 

Subtract 

SUB  ( 

'si' 

(S2)  (S3). 

SUM 

Sum 

SUM  ( 
SUM  ( 
SUM  ( 

'si; 

'si; 

'si; 

)  (S2). 
(S2)  (1). 
(S2)  (2). 

TRA 

h 

Transpose 

TRA  ( 

si; 

(S2). 

UPP 

Upper  triangle 

UPP  ( 
UPP  ( 

SI' 

'si' 

(S2^. 
(S2)  (1). 

VEC 

Vector  to  diagonal 

VEC  ( 

SI' 

(S2). 

VER 

1,2,5 

Vertical  aiogment 

VER  ( 

si' 

(S2^  (S3). 

1.  Conformability  of  input  matrices  is  checked. 

2.  Up  to  twenty-one  total  addresses  may  be  used. 

3.  A  warning  message  is  printed  if  no  rows  are  output. 

k.     Any  matrix  which  has  been  previously  stored  under  the  I  address  will 
be  destroyed. 

5-  An  input  address  may  be  used  more  than  once  for  input  to  the  same 
instruction. 

6.  As  many  arguments  as  are  needed  of  the  last  argument  type  may  be  used. 
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ABSOLUTE  VAI,UE  (mnemonic:   ABS) 

The  ABSOLUTE  VALUE  operation  has  two  address  parameters,  an  input 
address  and  an  output  address.   The  absolute  value  of  each  element  of 
the  input  matrix  is  taken  and  the  result  goes  to  the  output  address. 
Example : 


ABSOLUTE  VALUE  (SEQ,3)  (SEQ)+)  . 


ADD  (mnemonic:   ADD) 

The  ADD  operation  has  from  three  to  twenty-one  address  parameters. 
The  last  address  is  the  output  address;  all  other  addresses  are  for 
input.  Each  input  matrix  must  have  the  same  nimiber  of  rows  and  coli:irans 
as  all  other  input  matrices.  An  address  may  be  used  more  than  once 
as  an  input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  added  together,  and  the  result  goes  to  the  output 
addr e  s  s .  Example  s : 

ADD  (SEQ1)(SEQ4)(SEQ5). 

ADD  (SEQl) (SEQ4) (SEQ2) (SEQ3) (SEQ5) . 


ALL  (: 


mnemonic 


ALL 


The  ALL  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  a  set  of  elements  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  all  elements  of  the  set  for  a  given  row  pass  the  test, 
that  row  is  output  to  the  output  address. 

The  first  parameter  is  the  input  address.   The  relational 
operator  is  enclosed  in  quotation  marks.  The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number.' 
The  six  legal  relational  operators  are  "LT",  "LE",  "EQ",  "NE", 
"GT",  and  "GE".   Remaining  (optional)  parameters  are  index 
sets  specifying  which  variables  are  to  be  included  in  the  testing. 
If  no  variables  are  specified,  all  variables  are  included  in  the 
testing.   Note  that  if  only  one  variable  is  specified,  the  results 
of  the  ANY  and  the  ALL  operation  would  be  the  same.   Examples: 

ALL  (SEQl)  "NE"  *0.*  (SEQ,2)  . 
ALL  (SEQ3)  "GE"  (SEQU)  (SEQ,1)  . 

ANY  (mnemonic:  ANY) 


The  ANY  operation  performs  a  particular  test,  specified  by  the 
second  operand  as  a  relational  operator,  between  a  set  of  elements  for 
each  input  row  and  a  floating  point  number  specified  by  the  third 
operand.   If  any  element  of  the  set  for  a  given  row  passes  the  test, 
that  row  is  output  to  the  output  address. 
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The  first  parameter  is  the  input  address.  The  relational 
operator  is  enclosed  in  quotation  marks.   The  third  operand  may  be 
either  a  floating  point  number  or  an  address  in  which  case  the 
first  element  of  the  matrix  is  used  as  the  floating  point  number. 
The  six  legal  relational  operators  are  "LT",  "LE",  "EQ",  "NE",  "GT", 
and  "GE".   Remaining  (optional)  parameters  are  index  sets  specifying 
which  variables  are  to  be  included  in  the  testing.   If  no  variables 
are  specified,  all  variables  are  included  in  the  testing.  Note  that 
if  only  one  variable  is  specified,  the  results  of  the  AFf  and  the  ALL 
operation  would  be  the  same.  Examples: 

ANY  (SEQ2)  "GT"  *3.*  (SEQl). 

Pm    (SEQl)  "NE"  *-0.*  (SEQi+)(l,3). 

CHOLESKY  (Mnemonic:   CHO) 

The  CHOLESKY  operation  decomposes  a  square  symmetric  matrix  into 
the  product  of  an  upper  triangular  matrix  and  a  lower  triangular 
matrix  such  that  the  two  triangular  matrices  are  the  transpose  of 
each  other.   This  method  of  decomposition  is  sometimes  called  the 
"square  root"  method.   CHOLESKY  has  two  operands,  an  input  address 
and  an  output  address.   The  result  which  goes  to  the  output  address 
is  the  lower  triangular  matrix  resulting  from  the  decomposition. 

If  the  input  matrix  is  not  square,  the  "extra"  rows  or  columns 
are  ignored.   Additionally,  if  the  square  matrix  is  not  symmetric, 
the  actual  upper  triangle  of  the  matrix  is  effectively  ignored 
and  is  instead  assumed  to  be  identical  to  the  lower  triangle. 
Example : 

CHOLESKY'  ( SEQl )  ( SEQ,2 ) . 

If  the  input  matrix  on  SEO  1  is 


il 

-2 

-h 

-2 

2 

3 

-k 

3 

6 

the  resulting  matrix  output  to  SEQ  2  is 


2 

-1 
-2 


0 
0 

1 


COLUMN  DELETE  (mnemonic:   COL) 


The  COLUMN  DELETE  operation  specifies  which  columns  of  an  input 
matrix  are  to  be  deleted  before  sending  the  result  to  the  output 
address.   The  first  parameter  is  the  input  address.   The  second 
parameter  is  the  output  address.   Colimms  to  be  deleted  are 
specified  by  index  sets  following  the  output  address.  Examples: 

COLUMN  DELETE  (SEQi+)  (SEQ5K^)  (5)  (6)  (8)  (l3)  (l5) . 

COLUMN  DELETE  (SEQl)  (SEQ2)  (l,  30,  3^  • 
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CONSTMT  (mnemonic:   CON) 

The  CONSTAlfT  operation  has  three  parameters.   A  floating  point 
number  specified  by  the  second  operand  is  added  to  every  element  of 
the  matrix  specified  by  the  first  operand.   The  result  goes  to 
the  third  operand  address. 

The  second  operand  may  be  either  a  floating  point  number  enclosed 
in  asterisks  or  a  standard  SOUPAC  input  address.   If  an  address  is 
specified,  the  first  element  of  the  matrix  at  the  address  is  used  for 
the  floating  point  niomber.  Examples: 


CONSTAlilT  (SEQl)  ^4.5*  (SEQ2). 
CONSTANT  (SEQ,1)  (SEQ3)  (SEQ^l-) . 


COUNT  (mnemonic:   COU) 

The  COUNT  operation  has  three  operands;  an  input  address,  an  output 
address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  a  count  of  the  number  of  elements,  excluding 
missing  data,  of  each  column  of  the  input  matrix.   Specifying  no 
option  is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector  containing  a  count  of  the  number  of  elements,  excluding 
missing  data,  ,of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a 
single,  element  matrix  is  output  which  contains  a  count  of  the 
number  of  elements,  excluding  missing  data  over  the  entire  matrix. 
Examples : 


COUIJT  (SEQl)  (SEQ>). 

COUNT  (SEQ5)(SEQ3)(l). 

COUNT  (SEQ,2)(SEQ3)(2). 

CROSS  PRODUCT  (mnemonic:   CRO) 


The  CROSS  PRODUCT  operation  has  three  operands,  an  input  address,  an 
output  address,  and  one  option  flag.   The  output  address  is  the  square 
symmetric  matrix 


T 
X  X 


,T 


which  results  from  the  matrix  multiplication  of  X  and  X,  where  X  is  the 
input  matrix  and  X*^  is  the  transpose  of  the  input  matrix. 

The  option  flag,  coded  as  (l),  is  used  whenever  it  is  desired  that  the 
X  matrix  used  in  forming  the  cross  products  matrix  is  the  input  matrix 
with  an  additional  column  of  I's  as  the  first  column  of  the  matrix.   Note 
that  when  the  option  is  used,  the  output  matrix  will  have  one  more  row 
and  colimin  than  if  the  option  is  not  used. 
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DIAGONAL  (mnemonic:  DIA) 

The  DIAGONAL  operation  has  two  operands,  an  input  address  and 
an  output  address.   The  main  diagonal  elements  of  the  first 
matrix  are  used  to  form  a  single  row  vector  which  is  output  to  the 
second  operand  address.  Example: 

DIAGONAL  (SEQ2)(SEQi^). 

DIMENSION  (mnemonic:  DIM) 

The  DIMENSION  operation  has  two  address  parameters,  an  input 
address,  and  an  output  address.  The  number  of  rows  and  the  number 
of  columns  of  the  input  matrix  are  used  to  form  the  first  and 
second  elements  respectively  of  a  two  element,  single  row  matrix 
which  is  output  to  the  output  address.  Example: 

DIM  (SEQ2)(SEQ1). 
DOUBLE  (mnemonic:  DOU) 

The  DOUBLE  operation  has  anywhere  from  one  to  tv:enty-one 
addresses  as  parameters.   Listing  an  address  as  a  parameter  negates 
the  effect  of  any  previous  listing  of  that  address  as  a  parameter 
in  the  operation  SINGLE.   Listing  an  address  as  a  parameter  which 
has  not  appeared  as  a  SINGLE  subparameter  has  no  effect.  Example: 

DOUBLE  (SEQl)  (SEQ2)  (SEQ3)  (SEQi+)  (SE05)  (l) . 

EJECT  (mnemonic :  EJE  ) 

The  EJECT  operation  causes  the  next  printout  to  begin  at  the 
top  of  a  new  page.  EJECT  has  no  parameters.  Example: 

EJECT. 


E -DIVIDE  —  Elementwise  Divide  —  (mnemonic:  E-D) 

The  E-DIVIDE  operation  has  from  three  to  twenty-one  address 
parameters.  The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.  Each  input  matrix  must  have  the  same 
number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.  A.n  address  may  be  used  more  than  once  as  an 
input  addre  s  s . 

Elements  of  the  second  matrix  through  the  next  to  last  matrix  are 
divided  into  the  corresponding  elements  of  the  first  matrix.  Output 
goes  to  the  last  address.  Example: 

E-DIVIDE  (SEQ1)(SEQ2)(SEQ3). 

E-MULTIPLY  --  Elementwise  Multiply  —  (mnemonic:  E-M) 

The  E-MULTIPLY  operation  has  from  three  to  twenty-one  address 
parameters.   The  last  address  is  the  output  address;  all  other 
addresses  are  for  input.   Each  input  matrix  must  have  the  same 
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number  of  rows  and  columns  as  all  other  input  matrices  for  the  use 
of  the  operation.   An  address  may  be  used  more  than  once  as  an 
input  address. 

Corresponding  elements  of  the  first  matrix  through  the  next  to 
last  matrix  are  multiplied  together.   Output  goes  to  the  last 
address.   Examples: 

E-MULTIPLY  (SEQl)  (SEQl)  (SEQ2). 
E-MULTIPLY  (SEQ3)  (SEQ2)  (l)  ( SEQU ) . 

E-ROOT  —  Elementvise  Square  Root  —  (mnemonic:   E-R) 

The  E-Root  operation  has  two  address  parameters,  an  input  address 
and  an  output  address.   The  (positive)  square  root  of  each  element 
of  the  input  matrix  is  taken  and  the  result  goes  to  the  output 
addr  e  s  s .   Exampl e : 

E-ROOT  (SEQ1)(SEQ2). 

EXPAITO  (mnemonic:  EXP) 

The  EXPAND  operation  takes  the  first  row  of  the  first  input  matrix 
and  repeatedly  outputs  that  same  row  to  the  output  address  the  nimiber 
of  times  specified  by  the  second  parameter. 

The  second  parameter  can  be  either  a  floating  point  number  in  which 
case  the  input  row  is  copied  to  the  output  nddress  the  number  of  times 
specified  by  the  integer  portion  of  the  floating  number;  or  the 
second  parameter  can  be  an  input  address  in  which  case  the  input  row 
is  copied  to  the  output  address  until  the  output  matrix  has  the 
same  number  of  rows  as  the  second  input  matrix.   The  third  parameter 
is  the  output  address.  Examples: 

EXPAJ}©  (SEQ1)(SEQ2)(SEQ3). 
EXPAJTO  (SEQ,1)*55*(SEQU). 

FILE  (mnemonic:  FIL^ 

The  FILE  operation  has  anytNrhere  from  one  to  twenty-one  addresses 
as  parameters.  FILE  is  used  to  cause  an  end-of-file  mark  to  be 
written  at  the  end  of  a  SEQUENTIAL  file.   This  operation  is  generally 
most  useful  to  the  user  who  wishes  to  place  more  than  one  file  on 
his  own  physical  tape.   Since  any  meaningful  use  of  the  FILE 
operation  requires  the  addition  of  appropriate  IBM  360  JCL  cards, 
all  but  the  most  experienced  users  should  see  a  consultant  in  the 
SOUPAC  office  before  using  this  operation.  Example: 

FILE  (SEQ5^. 


GEIiTERATE  (mnemonic:  GEN") 

The  GENERATE  operation  generates  a  single  row  vector  with  the 
floating  point  numbers  the  user  specifies.   The  first  operand  is 
the  output  address.   Remaining  parameters  are  as  many  floating 
point  numbers  as  the  user  wishes.  Example: 

GENERATE  (GEQ2''  "^l.*  *2.*  *k.*   *8.*  *l6.'^   *32.*. 
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HORIZONTAL  AUGMENT  (mnemonic:  HOR' 

The  HORIZONTAL  AUGMENT  operation  has  from  three  to  twenty-one 
address  parameters.   The  last  address  is  the  output  address;   all 
other  addresses  are  for  input.  Each  input  matrix  must  have  the 
same  number  of  rows  as  all  other  input  matrices  for  the  use  of  the 
operation.   An  address  may  be  used  more  than  once  as  an  input  address. 

Input  matrices  from  the  first  matrix  through  the  next  to  last 
matrix  are  stacked  left  to  right  and  the  result  goes  to  the  last 
addre  s  s .  Example : 

HORIZONTAL  AUGMENT  (SEQlUSEQ>USEQ2Ul)  • 


IDENTITY  (mnemonic:   ^t^F^ 

The  IDENTITY  operation  has  two  parameters.   M  identity  matrix, 
of  order  specified  by  a  fixed  point  n\;imber  as  the  first  operand,  is 
output  to  the  address  specified  by  the  second  operand.  Example: 

IDENTITY  (20)(SEQ3^. 

INPUT  (mnemonic:  INP"^ 

The  INPUT  operation  will  input  formatted  or  non-formatted 
records  from  any  available  device.   This  option  is  primarily  for 
reading  card  images  or  other  similar  data  the  user  may  have 
usually  on  his  own  tape,  which  would  be  awkward  to  input  in  the 
typical  card  deck  manner. 

Never  input  to  I  (see  SPECIAL  COMMENTS^  using  the  INPUT  operation 
unless  both  the  number  of  rows  and  the  number  of  col^jmns  of  the 
input  matrix  are  specified  as  parameters  on  the  INPUT  operation 
parameter  card. 

The  parameters  for  INPUT  are  the  input  address,  the  output 
address,  the  number  of  rows  of  the  input  matrix  (optional  in 
most  cases^,  number  of  columns,  and  optionally  the  format  enclosed 
in  quotation  m.arks.   Examples: 

INPUT  (SEQ1)(SEQ,2/PRINTH20H5^  "(10F8.3^". 
INPUT  (SEQ2)  (SEQ3^  (  HQ) . 

im'ERT  (mnemonic:  IW^' 

The  INVERT  operation  inverts  a  non-singular  real  matrix. 
The  INVERT  operation  has  five  subparameters,  the  last  three  of 
which  are  optional.   The  first  parameter  is  the  address  of  the  matrix 
to  be  inverted,  and  the  second  parameter  is  the  output  address  of 
the  result.   (The  incore  address  option  described  in  section  III. A  - 
SPECIAL  COMMENTS  -  may  be  used  for  either  input,  output,  or  both). 
To  have  the  determinant  of  the  original  matrix  printed  out,  code  a 
(l)  as  the  third  parameter. 

The  inversion  technique  used  is  the  Gauss-Jordan  method  with 
pivot  elements  assumed  to  be  on  the  main  diagonal.   If  it  is 
desired  that  the  inversion  technique  perform  row  and  column 
interchange,  for  the  purpose  of  picking  pivot  elements  as  those  with  t 
the  largest  absolute  value  at  each  step  of  the  elimination  procedure. 


II. MAT. 11 


code  a  (l)  as  the  fourth  parameter.   The  defaiilt  case,  pivot  elements 
assumed  to  be  on  the  main  diagonal,  executes  faster  than  when  row 
and  column  interchange  is  performed.  For  those  real  symmetric 
matrices  which  have  the  property  that  the  largest  elements  are 
necessarily  on  the  main  diagonal  (e.g.  correlation,  cross-products, 
variance-covariance  matrices'^  numerical  accuracy  of  the  results  is 
not  significantly  different  between  the  two  options.  For  general 
matrices  in  which  specific  properties  are  not  known,  using  row  and 
column  interchange  will  probably  produce  more  accurate  results. 

The  fifth  argument  is  a  floating  point  number  enclosed  in  asterisks 
which  is  to  be  used  as  the  criterion  for  singularity.   If  the  absolute 
value  of  any  pivot  element  is  less  than  the  criterion  for  singularity, 
the  matrix  is  assumed  to  be  singular.   If  no  value  is  specified,  orn 
if  *0.*  is  specified  as  the  fifth  parameter,  a  default  value  of  10~ 
is  used  to  test  for  singularity. 

INVERT  destroys  any  previous  use  of  the  incore  address  option. 
All  calculations  are  done  in  double  precision. 

INYERT  also  has  the  ability  to  solve  a  set  of  simiiltaneous  linear 
equations  if  a  unique  solution  exists.   To  solve  the  system  indicated 
by  the  matrix  equation 


iti 

"J 


AX  =  Y 

input  to  the  INVERT  suboperation  a  matrix  which  contains  A:Y  (i.e. 
the  constant  term  appearing  as  the  last  column  variables).   The 
resulting  output  of  the  INVERT  suboperation  will  be 

The  Y  above  may  be  more  than  one  column  vector  in  which  case  each 
resulting  column  vector  of  X  will  be  the  solution  for  the  corresponding 
column  of  Y.  Examples: 

INVERT  (SEQ1)(SEQ2). 

INVERT  (SEQA) (SEQ3) (l) (l) . 

INVERT  (SEQ2)(SEQUKi)(  )  *10.E-5*. 

INVERT  (SEQ5)(SEQ1)(  ) (l)  *. 0000001*. 

KROMECKER  PRODUCT  (mnemonic :   KRO ) 

The  KRONECKER  PRODUCT  operation  forms  the  Kronecker  Product 
of  two  matrices  and  outputs  the  results  to  an  output  address. 
The  resulting  output  matrix  is  composed  of  m^  x  n   submatrices 
where  m-|_  and  n^  are  the  dimensions  of  the  first  input  matrix. 
Each  submatrix  has  the  size  mg  x  n2  where  m2  and  n2  are  the 
dimensions  of  the  second  input  matrix.   Note  that  the  output 
matrix  has  the  dimensions  m2m2  x  n]_n2.   Each  submatrix  is  the 
result  of  the  scalar  product  a. .  B. 

For  example,  if  matrix  A  is  oA  SI  and  B  is  on  S2  where 


1. 
A=  -1 
0 


B= 


1  3 

2  k 

the  result  of  executing  the  MTRIX  statement 

KRONECKER  (Sl ) (S2 ) (S3 ) . 
would  be  the  following  matrix  on  S3. 


mssissm 
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1 

3 

2 

6 

2 

k 

U 

8 

•1 

-3 

1 

2 

■2 

-k 

3 

U 

0 

0 

.5 

1.5 

0 

0 

1 

2 

LABEL  (mnemonic:   LAB) 

The  LABEL  operation  is  used  to  store  a  title  and  column  labels  at 
a  SOUPAC  address  for  later  use  within  a  MATRIX  program.  The  title 
is  limited  to  128  characters.   Labels  are  limited  to  eight  characters 

The  first  parameter  is  the  address  where  the  title  and  labels 
are  to  be  stored.   This  is  then  followed  by  the  title  and  labels 
each  enclosed  in  quotation  marks.  Only  one  label  set  is  actxve  at 
any  one  time.  Hence,  each  use  of  LABEL  overrides  all  previous  uses. 
Labels  generated  within  a  MATRIX  program  may  not  be  passed  to  other 
programs.   (note:  The  incore  address  option  may  be  used  to  store  a 
title  and  label  set  if  desired.)  ,    ^      ^ 

rn,.  ,-^,c>  a  Qpt  o-f  la"b'='ls  "hich  have  been  stored,  use  [h]    after  the 
-r^-nt  ^^rtion  of  the  output  matrix  to  be  labelled.   For  example,  the 
following  statement  pair  will  label  the  result  of  a  HORIZONTAL  AUGMENT . 


LABEL(S5) "SAMPLE  DATA""ID' 
H0RIZ(S1)(S2)(S3/P(L)). 


'AGE""HEIGHT""WEIGHT' 


This 


S5  is  being  used  as  a  convenient  place  to  store  the  labels 

presumes  that  S5  is  not  being  used  for  anything  else  and  is  available. 

LAG  '(Mnemonic:   LAG) 

The  LAG  operation  has  the  following  operands;  an  input  address, 
an  integer  specifying  the  number  of  lag  periods  to  be  added,  an 
output  address,  and  index  sets  specifying  which  variables  are  to  be 

lagged . 

For  example,  suppose  we  are  interested  in  lagging  the  fourth 
variable  of  a  five  variable  matrix  and  suppose  that  we  want  three 
lag  periods.   First  it  should  be  noted  that  the  resulting  output 
matrix  will  necessarily  have  three  fewer  rows  (observations)  than 
the  input  matrix. 

If  we  use 

LAG(S1)(3)(S2)(U). 

the  resulting  t^''^  row  of  the  output  address  would  be 


X, 


t,l  ^,2  \,3  ^t,U  \-l,U  \-2,U  -t-3,U   t,5 
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As  a  concrete  example,  if  the  following  input  is  on  SI 


1 

2 

3 

k 

5 

6 

1 

8 

9 

10 

11 

12 

13 

lU 

15 

16 

17 

18 

Use  of  either  the  statement 

LAG(S1)(2)(S2)(2)(3). 
LAG(S1)(2)(S2)(2,3). 


or 


will  result  in  the  following  matrix  being  output  onto  S2. 
5296 


7  e 

10  11   8 

13  1^  11 


2   9 

5  12   9 

8  15  12 


16  17  1^  11  18  15 
LOWER  TRIANGLE  (mnemonic:   LOW; 


3 
6 
9 

12 


The  LOWER  TRIANGLE  instruction  copies  a  matrix  from  one 
address  to  another  and  sets  all  elements  which  are  above  the 
main  diagonal  to  zero.   It  is  possible  to  indicate  if  it  is 
desired  that  the  main  diagonal  elements  also  be  set  to  zero. 

The  LOWER  TRIANGLE  instruction  has  three  operands;  an 
input  address,  an  output  address,  and  an  integer  option  flag. 
If  the  option  flag  is  omitted  or  is  zero,  the  main  diagonal 
elements  are  included  as  part  of  the  lower  triangle.   If  the 
option  flag  is  non-zero,  the  main  diagonal  elements  are  set  to 
zero.  Examples: 

LOWER  (SI) (S3). 
LOWER  (S2)  (Sl+)(l). 

MAKIMUM  (mnemonic:   MAK) 

The  MAXIMUM  operation  has  three  operands;  an  input  address,  an 
output  address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  the  maximum  element  of  each  column  of  the 
input  matrix.   Specifying  no  option  is  equivalent  to  specifying  option 
0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector  containing  the  maximum  element  of  each  row  of  the 
input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1,  a 
single  element  matrix  is  output  which  contains  the  maximum  element 
of  the  entire  matrix.  Examples: 

MAX  (SEQ1)(SEQ3). 
MAX  (SEQa)(SEQ2)(2). 
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MNIMUM  (mnemonic:  MIN) 

The  MINIMUM  operation  has  three  operands;  an  input  address,  an 
output  address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a 
single  row  vector  containing  the  minim-um  element  of  each  column 
of  the  input  matrix.   Specifying  no  option  is  equivalent  to 
specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a 
single  column  vector  containing  the  minimum  element  of  each  row 
of  the  input  matrix.   Examples: 

MIN  (SEQ1)(SEQ3). 
MIN  (SEQ1)(SEQ2)(2). 

MOVE  (mnemonic:  MOV) 

The  MOA/E  operation  moves  (actually  copies)  a  matrix  from  one 
SOUPAC  standard  input  source  to  another.   If  reading  from  SEQUENTIAL, 
the  MOVE  operation  assumes  that  the  data  set  was  created  using  SOUPAC 
conventions,  i.e.  by  some  SOUPAC  program.   If  the  input  source  is 
CARDS,  the  input  deck  must  be  preceded  by  a  correct  DATA  format 
statement  and  terminated  by  an  ENDi^  card. 

Never  MOVE  .from  CARDS  to  T  (see  SPECIAL  COMMENTS)  \anless  both 
number  of  rows  and  number  of  colimns  of  the  input  matrix  are 
specified  at  the  front  of  the  data  deck. 

The  operation  has  between  two  and  twenty-one  addresses  as 
parameters.   The  first  address  is  the  input  address.  All  remaining 
addresses  are  output  addresses.  Examples: 

MOVE  (CARDS) (SEQ1)(SEQ2). 
MOVE  (cards) (SEQl). 

r^LTIPLY  (mnemonic:  MUL) 


The  MULTIPLY  operation  has  three  addresses  for  parameters.  A 
matrix  multiplication  is  performed  between  the  matrices  on  the  first 
two  addresses  and  the  result  is  stored  in  the  third  address.  The 
MULTIPLY  operation  permits  use  of  the  same  address  to  be  used  as  an 
input  address  for  both  first  and  second  operands.   This  usage  is 
equivalent  to  using  the  SQUARE  suboperation. 

The  incore  address  option  may  not  be  used  for  input  of  the  first 
operand  if  the  first  operand  is  different  from  the  second.   The 
incore  address  option  may  never  be  used  for  output  of  the  result. 
The  WLTIPLY  operation  destroys  any  matrix  which  has  been  stored 
in  core  using  the  incore  address  option  (see  section  III. A  -  SPECIAL 
COMMENTS"^.   All  calculations  are  done  in  double  precision.   Examples 

MULTIPLY  (SEQl^ (SEQ2) (SEQ3) . 
MULTIPLY  (SEQl) (SEQl) (SEQ2) . 
MULTIPLY  (SEQ2)(lHSEQ5). 
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OUTPUT  (mnemonic:  OUT) 

The  OUTPUT  operation  outputs  a  matrix,  a  row  at  a  time  with  or 
without  format  control,  to  a  user  specified  data  set.   If  format 
control  is  used,  the  output  address  specified  should  not  be  used 
anywhere  else  in  the  current  SOUPAC  job  step  except  with  options 
which  also  perform  formatted  l/O  (e.g.  the  INPUT  and  OUTPUT  operations 
of  MATRIX"!.   The  syntax  of  OUTPUT  is  two  addresses,  input  and 
output  addresses  respectively,  optionally  followed  by  the  desired 
format  enclosed  by  quotation  marks.  Examples: 

OUTPUT  (SEQ1)(SEQ5)  "(2OEI5.7V'. 
OUTPUT  (SEQ2)(SEQ3K 

PARTITION  (mnemonic:   PAR) 

The  PARTITION  operation  is  used  to  select  a  sub-matrix  of  an 
original  input  matrix.   The  first  operand  is  the  input  address 
and  the  second  operand  is  the  output  address.   The  next  four 
parameters  specify  in  order,  the  beginning  coliimn  of  the  partition, 
the  ending  column  of  the  partition,  the  beginning  row  of  the  partition, 
and  the  ending  row  of  the  partition.   If  either  beginning  parameter  is 
left  out,  the  partition  begins  with  the  first  row  (or  column).   If 
either  ending  parameter  is  left,  the  partition  ends  with  the  last 
row  (coliMn) .  Examples: 

PARTITION  (SEQ5)(SEQ2)(5)(6)(2)(50) 
PARTITION  (SEQU)(SEQ2)(3)(^0). 

PERMUTATION  (mnemonic:   PER) 

The  PERMUTATION  operation  permutes,  on  option,  rows  or  col\mins 
or  rows  and  columns  of  an  input  matrix.   The  resulting  matrix  is 
output  to  the  second  operand  output  address. 

The  third  operand  is  an  option  flag.   If  the  option  flag  is 
specified  as  zero,  columns  of  the  matrix  are  permuted.   If  the  option 
flag  is  specified  as  one,  rows  of  the  matrix  are  permuted.   If  the 
option  flag  is  specified  as  other  than  zero  or  one,  both  rows  and 
columns  are  permuted. 

The  order  of  columns  or  rows  of  the  output  matrix  is  determined 
by  index  sets.   Examples: 

If  the  statem.ent 

PERMUTE  (Sl)(S2)(0)(5)(l,U). 
is  used  and  31  has  the  matrix' 


1. 

2.    3.    h.         5. 

-1. 

0.    2.     .1    .3 

the  resulting 

output  matrix  will  be 

5. 

1.    2.    3.    h. 

.3 

-1.          0.    2.    1. 
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If  the   input  matrix  had  "been 


1. 
-1. 


2. 
0. 


3. 
2. 


.1 


5.  6, 

.3         k. 


the  resulting  output  matrix  would  be  the  same  as  the  output  matrix  already- 
listed.   Note  that  this  implies  that  only  those  columns  or  rows  which  are 
explicitly  listed  will  he  output. 

If  the  statement 

PER  (S1)(S2)(1)(3)(2)(1). 

is  used  and  the  matrix  on  SI  is 


2  3 
5  6 
8    9 


the  resulting  output  matrix  will  be 


3 

2 

1 

6 

5 

1+ 

9 

8 

7 

If  the  statement 

PER  (S1)(S2)(2)(3)(2)(1) 
is  used  on  the  matrix 


the  resulting  output  matrix;  will  be 


Warning :   When  permuting  rows  alone  or  rows  and  columns,  it  may  be  necessary 
to  include  a  prolog  card  #DEFIWE  for  DU9  with  appropriate  parameters.   In 
addition,  space  on  FT99F001  should  be  checked.   See  a  SOUPAC  consultant  for 
assistance  whenever  permuting  rows. 

PRINT  (mnemonic:   PRI) 

The  PRINT  operation  prints  out  a  matrix,  one  row  at  a  time,  under  the 
control  of  a  user  supplied  format.  Fo2:Tnats  follow  FORTRAN  IV  conventions 
with  the  added  restriction  that  formats  are  limited  to  592  characters. 

The  first  parameter  is  the  address  of  the  matrix  to  be  printed.   The 
second  parameter  is  the  format  enclosed  in  quotation  marks.   (Warning:   Allow 
for  carriage  control  as  the  first  character  in  output  lines.   A  print  line  has 
133  characters.)   Example: 


PRINT  (S2)  "( '  ' ,3F20.10)" 
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PITOCH  (mnemonic:   PUN) 

The  PUNCH  operation  has  the  same  syntax  as  PRINT  and  is  used  to  punch 
out  a  matrix  under  the  control  of  a  user  supplied  format.   (Warning:   When 
punching  cards,  remember  that  there  is  room  for  only  80  characters  per  card). 

The  PUNCH  operation  always  punches  two  cards  in  addition  to  the  actual 
data  deck.  At  the  front  of  the  data  is  punched  a  DATA  format  card,  and  at 
the  end  of  the  data  is  punched  an  END#  card.   Example: 

PUNCH  (S1)"(8F10.2)". 

RECIPROCAL  (mnemonic:   REC ) 

The  RECIPROCAL  operation  has  two  operands,  an  input  address  and  an  output 
address.   The  reciprocal  of  the  elements  from  the  first  matrix  are  used  to 
form  an  output  matrix  which  is  output  to  the  second  operand  address.   Example; 

RECIPROCAL  (si)  (S3). 

REMAP  (mnemonic :   REM) 

The  REMAP  instruction  is  used  to  change  single  rows  of  input  into  several 
rows  of  output  or  to  change  several  rows  of  input  into  single  rows  of  output. 
The  REMAP  instruction  has  three  operands;  an  input  address,  an  output  address, 
and  an  integer  which  is  to  be  used  as  the  column  dimension  of  the  output 
address . 

Case  1:   Map  single  rows  of  input  into  several  rows  of  output.   In  this 
case  the  column  dimension  of  the  output  address  is  less  than  and  must  divide, 
the  column  dimension  of  the  input  address  matrix.   For  example,  if  we  have  on 
SI  the  data  matrix 


1.     2.     3.     h.  5. 

and  use  the  instruction 

REMAP(S1)(S2)(2). 

the  resulting  output  to  S2  would  be 


6. 


Notice  that  using  the  instruction 

REMAP(S1)(S2)(U). 

would  be  an  error  since  h,   the  column  dimension  of  the  output  matrix  does  not 
divide  6,  the  coliimn  dimension  of  the  input  matrix. 

Case  2:   Map  several  rows  of  input  into  a  single  row  of  output.   In  this 
case,  the  column  dimension  of  the  output  matrix  must  be  a  multiple  of  the 
column  dimension  of  the  input  matrix.   Furthermore,  if  the  multiple  is  some 
value  m,  m  must  divide  the  number  of  rows  of  the  input  matrix.   For  example, 
if  we  have  on  S3  the  data  matrix: 
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1. 

2. 

3. 

h. 

5. 

6. 

and  use  the  instruction 

REMAP(S3)(SU)(6). 

the  resulting  output  matrix  on  SU  would  be 

1.     2.     3.     h.  5.     6. 

Notice  that  the  number  of  columns  of  the  output  matrix  is  three  times  larger 
than  the  column  dimension  of  the  input  matrix.   Notice  also  that  three  divides 
the  number  of  rows  of  the  input  matrix. 

REWIND  (mnemonic:   REW) 

The  REWIND  operation  has  anywhere  from  one  to  twenty-one  addresses  as 
parameters.   REWIND  is  used  to  rewind  a  sequential  file.   The  REWIND  operation 
needs  only  to  be  used  with  the  INPUT  operation  when  it  is  desired  to  reread 
a  formatted  input  file.   Example  (to  input  the  same  formatted  file  from  S3 
onto  both  SI  and  S2  under  control  of  different  formats): 

INPUT  (S3)(S1)(  )(5)  "(10X,5F10.0)". 

REWIND  (S3). 

INPUT  (S3)(S2)(  )(8)  "(8F10.0)". 

ROW  DELETE  (mnemonic:   ROW) 

The  ROW  DELETE  operation  specifies  which  rows  of  an  input  matrix  are  to 
be  deleted  before  sending  the  result  to  the  output  address.   The  first 
parameter  is  the  input  address.   The  second  parameter  is  the  output  address. 
Rows  to  be  deleted  are  specified  by  index  sets  following  the  output  address. 
Example: 

ROW  DELETE  ( I ) (S3 ) (l ) (3 ) (7 ) (8) (ll) (U5 ) . 

RECIPROCAL  OF  SQUARE  ROOT  OF  DIAGONAL  (mnemonic:   RSD) 

The  RSD  operation  has  two  operands,  an  input  address  and  an  output 
address.   The  reciprocal  of  the  square  root  of  the  main  diagonal  elements 
from  the  first  matrix  are  used  to  form  a  single  row  vector  which  is  output 
to  the  second  operand  address.   Example: 

RSD  (SI)  (S3). 
SCALAR : (mnemonic :   SCA  ) 

The  SCALAR  operation  has  three  parameters.   A  floating  point  number 
specified  by  the  second  operand  is  multiplied  by  every  element  of  the  matrix 
specified  by  the  first  operand.   The  result  goes  to  the  third  operand  address. 

The  second  operand  may  be  either  a  floating  point  number  enclosed  in 
asterisks,  or  a  standard  SOUPAC  input  address.   If  an  address  is  specified, 
the  first  element  of  the  matrix  at  the  address  is  used  for  the  floating  point 
number.   Example: 


( 
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SCALAR(Sl)  *2.*  (S2). 
SCALAR(Sl4)(S3)(S2). 

SINGLE  (mnemonic:   SIN) 

The  SINGLE  operation  has  anywhere  from  one  to  twenty-one  addresses  as 
parameters.   Listing  an  address  as  a  parameter  causes  any  matrices  written 
on  that  address  to  "be  written  in  single  precision.   MATRIX  stores  all  data 
matrices  in  double  precision  unless  the  user  specifies  otherwise  with  the 
suboperation  SINGLE.   The  listing  of  an  address  in  a  SINGLE  statement  in 
one  MATRIX  program  does  not  carry  over  in  effect  to  any  other  MATRIX  Program, 

SINGLE  (Sl)(S2)(Sl|). 


SUBTRACT  (mnemonic:   SUB) 

The  SUBTRACT  operation  has  from  three  to  twenty-one  address 
parameters.   The  last  address  is  the  output  address;  all  other  addresses 
are  for  input.   Each  input  matrix  must  have  the  same  number  of  rows 
and  columns  as  all  other  input  matrices  for  the  use  of  the  operation. 

Elements  of  the  second  matrix  through  the  next  to  last  matrix  are 
subtracted  from  corresponding  elements  of  the  first  matrix.   Output 
goes  to  the  last  address.  An  address  may  be  used  more  than  once  as 
an  input  address.   Examples: 

SUBTRACT  (SEQl)  (SEQ3)  (SEQ/0  . 

SUBTRACT  (SEQU) (SEQ2) (SEQ3) (SEQI) (SEQ5) . 

SUM   (mnemonic:   SUM) 

The  SUM  operation  has  three  operands;  an  input  address,  an  output 
address,  and  an  option  indicator. 

If  option  0  is  specified,  the  resulting  output  matrix  is  a  single 
row  vector  containing  the  column  sum  of  each  column  of  the  input  matrix. 
Specifying  no  option  is  equivalent  to  specifying  option  0. 

If  option  1  is  specified,  the  resulting  output  matrix  is  a  single 
column  vector  containing  the  row  sum  of  each  row  of  the  input  matrix. 

If  the  option  is  specified  as  any  number  other  than  0  or  1 ,  a 
single  element  matrix  is  output  which  contains  the  sum  of  all  elements 
over  the  entire  matrix.   Examples: 

SUM  (SEQ1)(SEQ3). 
SUM  (SEQ1)(SEQ2)(2). 

TRANSPOSE  (mnemonic:   TRA) 

The  TRANSPOSE  operation  transposes  a  matrix  (interchanges  rows 
and  columns).  TRANSPOSE  destroys  any  previous  usage  of  the  incore 
address  storage.   The  two  parameters  for  TRANSPOSE  are  first  the 
input  address  and  second  the  output  address  of  the  result.  Example: 

TRANSPOSE  (SEQ^)(SEQ2). 
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UPPER  TRIMGLE  (mnemonic:  UPP) 

The  UPPER  TRIANGI^  instruction  copies  a  matrix  from  one 
address  to  another  and  sets  all  elements  which  are  below  the 
main  diagonal  to  zero.   It  is  possible  to  indicate  if  it  is 
desired  that  the  main  diagonal  elements  also  be  set  to  zero. 

The  UPPER  TRIANGLE  instruction  has  three  operands;  an  input 
address,,  an  output  address,  and  an  integer  option  flag.   If  the 
option  flag  is  omitted  or  is  zero,  the  main  diagonal  elements 
are  included  as  part  of  the  upper  triangle.   If  the  option  flag 
is  non-zero,  the  main  diagonal  elements  are  set  to  zero.  Examples 

UPPER  (SI) (S3). 
UPPER  (S2)(si+)(1). 


VECTOR  (mnemonic:  VEC) 

The  \rECTOR  operation  has  two  operands,  an  input  address  and  an 
output  address.   A  single  vector  from  the  first  location  is  used 
to  form  a  diagonal  matrix  which  is  output  to  the  second  address. 
If  the  input  matrix  has  more  rows  than  columns,  the  first  column 
vector  is  used  to  form  the  diagonal  matrix.   If  the  input  matrix 
has  more  columns  than  rows,  the  first  row  is  used  to  form  the 
diagonal  matrix.   Example: 

VECTOR  (SEQ1)(SEQ3). 


VERTICAL  AUGMENT  (mnemonic:   VER) 


The  VERTICAL  AUGMENT  operation  has  from  three  to  twenty-one 
address  parameters.   The  last  address  is  the  output  address;  all 
other  addresses  are  for  input.  Each  input  matrix  must  have  the 
same  number  of  columns  as  all  other  input  matrices  for  the  use  of 
the  operation.  An  address  may  be  used  more  than  once  as  an  input 
addre  s  s . 

Input  matrices  from  the  first  address  through  the  next  to  the  last 
address  are  stacked  top  to  bottom  and  the  result  goes  to  the  last 
address.  All  input  matrices  must  have  the  same  niomber  of  columns. 
Example : 

VERTICAL  AUGIvENT  (SEQl)  (SEQ2)  (SEQ3)  • 


i 
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III.  Special  Comments 

A.   Incore  Address  Option 

Besides  the  standard  SOUPAC  addresses,  MATRIX  also  recognizes 
the  additional  address  I.  The  I  symbol  as  an  address  represents 
internal  storage  in  the  machine. 

An  obvious  use  of  this  feature  is  to  cut  down  on  l/O  time  for 
matrices  which  are  to  be  used  in  future  operations  within  the 
current  matrix  program.   The  internal  storage  feature  also  saves' 
time  when  the  user  desires  his  output  from  an  operation  to  be 
printed  or  punched.   The  user  must  keep  in  mind  that  data  cannot 
be  passed  to  subsequent  programs  with  the  I  storage.   The  user 
should  also  be  aware  of  the  restrictions  on  I  storage  as  mentioned 
above  in  some  of  the  subparameter  operations  (see  INVERT,  MULTIPLY, 
SQUARE,  and  TRANSPOSE).   In  all  cases  the  use  of  this  option  is  not 
recommended  for  matrices  which  do  not  fit  within  the  memory  available 
to  the  MATRIX  program  while  running  within  any  particular  region  size 

1)  To  add  the  matrix  on  SEQl  to  the  matrix  on  SEQ2  leaving 
the  result  in  core  and  also  printing  the  result,  code  as 
follows : 

ADD  (SEQl)(SEQ2)(l/PRINT). 

2)  To  vertically  augment  the  matrices  in  core,  on  SEQl  and  on 
SEQ2,  storing  the  result  on  SEQU,  code  as  follows: 

A/ERTICAL  AUGMENT  (l)  (SEQl)  (SEQ2)  (SEQi+) . 


B.   Labeled  Output 

Provided  in  the  MATRIX  program  is  the  facility  to  title  and  put 
column  labels  on  any  matrix  which  is  printed  using  normal  SOUPAC 
print  conventions.   The  labeling  feature  is  not  allowed  with  the 
PRINT  matrix  operation. 

To  use  the  labeling  feature,  it  is  first  necessary  to  put  the 
title  and  labels  in  a  temporary  storage  area.   This  is  accomplished 
with  the  LABEL  operation  (see  Subparameters) . 

To  use  a  label  which  has  been  placed  in  a  temporary  storage  area, 
code  (l)  after  the  print  portion  of  the  output  address.   If  F 
format  is  also  desired  code  either  (F, L)  or  (L,F)  after  the  print. 
Example  s : 

1)  To  move  (copy)  the  matrix  on  SEQl  onto  SEQ5  printing  the 
result  in  F  format  with  title  and  column  labels,  code  as 
follows : 

MOVE  ( SEQl ) (SEQ5/PRINT (F, L ) ) . 
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2)  To  add  the  matrices  on  SEQl  and  SEQ2  storing  the  result  on 
SEQ3  and  also  printing  the  result  with  title  and  column 
labels,  code  as  follows: 

ADD  (SEQ1)(SEQ2)(SEQ3/PRIM'(l)). 


3)  To  transpose  the  matrix  stored  on  SEQl  to  SEQ2,  printing 
out  the  result  in  F  format  with  title  and  column  labels, 
and  punching  out  a  card  deck  of  the  transposed  matrix, 
code  as  follows : 

TRANSPOSE  (SEQ1)(SEQ2/PRINT(L,F)/X). 


TRANSFORMATIONS 


I.   Purpose 

TRANSFORMATIONS  is  a  data  manipulations  program.   Unlike  MATRIX,  which 
performs  operations  on  a  complete  matrix,  TRANSFORMATIONS  operates  upon 
matrices  one  row  at  a  time.   This  strategy  provides  almost  unlimited  flexi- 
bility in  transforming  your  data.   Some  of  the  general  uses  include  creating 
new  variables  as  functions  of  present  variables,  recoding  or  collapsing  data, 
and  reordering  or  eliminating  variables.  More  advanced  uses  are  facilitated 
by  an  instruction  set  which  allows  testing  and  branching  depending  on  single 
variable  characteristics  or  relations  between  variables,  indirect  addressing 
(FLAG-NOTATION),  and  inputting  and  outputting  to  and  from  different  sequential 
units  during  the  program. 

TRANSFORMATIONS  serves  several  purposes  in  the  SOUPAC  system.   First,  it 
can  be  used  as  a  stand  alone  program  to  perform  computations  on  your  input 
data  and  yield  the  final  results.  Also,  it  can  be  used  to  prepare  your  data 
for  input  into  another  SOUPAC  program  or  to  make  modifications  from  the  out- 
put of  one  program  for  input  into  another . 

II.   Description 

The  TRANSFORMATIONS  program  reads  in  one  row  of  data  and  executes  the  pro- 
gram until  the  end  program  card  or  last  card  instruction  appears.   It  continues 
to  read  in  data  one  row  at  a  time,  while  executing  the  same  program  for  each 
successive  row  until  all  the  rows  of  data  have  been  processed. 

There  are  2000  variables  allowed  in  the  TRANSFORMATIONS  program.   Before 
each  row  of  data  is  read  into  the  program,  variables  1  through  1000  are  set 
to  zero.   Variables  1001  through  2000  are  set  to  zero  only  before  the  initial 
row  of  data  has  been  read.   Normally,  manipulations  performed  on  successive 
rows  of  data  are  independent  from  each  other,  but  when  values  are  moved  to 
variables  over  1000,  information  can  be  passed  from  one  row  to  another  or 
maintained  during  the  processing  of  the  whole  matrix.   This  feature  provides 
for  accumulating  sums  or  other  totals  as  well  as  the  capability  of  having 
information  from  previous  rows  of  data  determine  the  kinds  or  extent  of  manip- 
ulations to  be  performed  on  the  current  row. 

Input  to  the  program  can  be  specified  by  the  parameter  on  the  TRANS- 
FORMATIONS card  or  by  the  INPUT  instruction.   Output  can  only  be  achieved  by 
use  of  the  OUTPUT  instruction. 

Ill .   Parameters 

The  one  parameter  on  the  TRANSFORMATIONS  card  is  the  input  address  of  the 
main  input  matrix.   This  can  be  either  CARDS  or  SEQUENTIAL  1-15-   This  card 
is  followed  by  the  subparameter  cards  describing  the  transformations  to  be 
performed.   The  last  card  must  always  be  an  END  PROGRAM  card. 


TRA(C). 
ADD(l)(2)(3). 
DIV(1)(3)(U). 
OUT(P)  (1,U). 
ENDP 

The  main  input  matrix  is  from 
cards  and  the  output  is  printed. 


TRA(S3). 

L0G(3)(T). 

SQU(U)(8). 

OUT  (SU)(T,8). 

ENDP 

The  main  input  matrix  is  from  SEQ  3 
and  the  output  goes  to  SEQ  k . 
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IV.  Subparameter  List 

mnemonic  notes   operation  name 


examples 


ABO 
ABS 
ADD 

ANG 
A-C 

A-S 

A-T 
C-G 
COM 
CON 

COS 

DIF 

DIV 

EBC 

EXC 

EXI 

EXP 

FAC 

FIX 

FL0 

GO 

IF 

INPUT 

LAS 
ELO 

LOG 

MAX 

MIN 


abort 
1     absolute  value 
1,2    addition 

1     angle  to  radians 
l,k  arccosine 


,4 

arcsme 

1 

arctangent 

computed  go  to 

1 

combine 

3 

constant 

1 

cosine 

2 

difference  if 

1,2,U   division 

1     EBCDIC 
1     exchange 

exit 
1     exponent  base  e 

1     factorial 

1     fixed  point  conversion 

1     floating  point  conversion 

go  to 

arithmetic  if 

input  from  unit 

last  card  operation 
1,U    log  base  e 

l,i+    log  base  10 

1,2         maximum  value 
1,2         minimiom  value 


ABORT . 

ABS 

:i)(5i). 

ADD 
ADD 

:i)(2)(52). 

;i)(7)(ll)*8M53). 

ANG 

:3)(5M. 

A-C 
A-C 

;m(55). 

;5)(56)*0*"BAD". 

A-S 
A-S 

;6)(5T). 
:t)(58)(8)"*+1". 

A-T 

:9)(59). 

C-G 

;iO)"A""B""C""D". 

COM 

;ii)*io*(i2)(6o). 

CON 
CON 
CON 

;ioi)*u.3*.  ■ 
:io2)(T). 

;i03,llU)(230,235)*0,5*. 

COS 

:i3)(6i). 

DIF 
DIF 

;i)(5)"X""Y""Z". 

;6)*9*"N""z""p". 

DIV 
DIV 
DIV 

;iU)*io*(62). 

:i5)(6)(63)*0*. 
:i5)(6)(63)*0*"R". 

EBC 

:i6)(6U). 

EXC 

:6)(8). 

EXIT 

EXP 

'1T)(65). 

FAC  ( 

:i8)(66). 

FIX  ( 

19)(6T). 

FLO  ( 

20)(68). 

GOTO 

"PLACE" . 

IF(1 

)  "*+l""EX""*+l". 

INP  ( 
INP  ( 

INP 

S2)(200). 
S3) (301) "EOF". 
'Si+)(U10)"END"(1T)"(17FU 

0)" 

LAST 

ELO  ( 
ELO  ( 

2l)(69). 

21)(69)*0*"NEG". 

LOG  ( 
LOG  ( 

22)(70). 
22)(T0)*0*. 

MAX  ( 

2)(U)(T)*10*(9)(T1). 

MIN  ( 

1)(3)(5)(T2). 
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rinemonic      notes       Qperatlon  name 


examples 


MOD 

1,2 

modular  arithmetic 

MOD 

'23)*i+*(T3). 

MOV 

I 

move 

MOV 

2M(TM. 

MUL 

1,2 

multiplication 

MUL  < 
MUL 

1)(25)(26)(T5). 
'2T)*3*(T6). 

NO 

no  operation 

NOOP 

OUT 

3 

output  to  unit 

OUT  ( 
OUT 

'P(F))(1,30). 
S3)(l,5)(8)(51,85). 

PER 

3 

permute 

PER  ( 
PER  ( 

T01)(6,10)(51,85)(2). 
80l)*l,l+*(6)*l,4*(T). 

RAD 

1 

radians  to  angle 

RAD  ( 

28)(TT). 

REC 

1,2 

recode 

REC 
REC  ( 

29)"GT"(30)(t8)*11*. 
31)"EQ"(32)(T9)*0*(T9)*1*. 

riG 

1 

sign  transfer 

SIG  ( 

:33)(3M. 

SIN 

1 

sine 

SIN 

:35)(80). 

SKI 

2 

skip  record  on  unit 

SKI 
SKI 

;S2)(100). 

;s3)*i*. 

SQU 

l,i^ 

square  root 

SQU 
SQU 

:36)(8l). 
;3T)(82)*-1*"IM". 

SUB 

1.2 

subtraction 

SUB 
SUB 

;38)(39)(83). 

;uo)*2«(8U). 

SUM 
SUP 
WAR 
XAD 

XDI 
XIF 
XI-IU 
X3M 
XSU 


1  siainmation 

suppress  warnings 

warnings  on 
5     fixed  point  addition 
5     fixed  point  division 
5     fixed  point  arithmetic  if 
5     fixed  point  multiplication 
5     fixed  point  summation 
5     fixed  point  subtraction 


SUM  (1)(10)(85). 

SUP. 

WARN. 

XADD(Ui)(U2)(U9). 

XDIV(Ul)(U2)(l48). 

XIF  (U6)"P""0""T'V 

xmul(Ui)(U2)(Ut). 
XSM  {k6)ih9){k3). 
XSUB(i+l)(li2)(U6). 


NOTES 

The  following  features  are  available  to  an  instruction  if  and  only  if 
the  number  appears  in  the  notes  for  that  instruction: 

1.  DO-notation  may  be  used  with  variables  and  floating  point  constants 

2.  Floating  point  constants  may  be  substituted  for  input  variables. 

3.  Variable  ranges  may  be  specified  instead  of  single  variables. 

h.      Substitute  output  values  and  transfer  labels  may  be  used  to  avoid 

program  termination  in  the  case  of  undefined  output  values. 
5.   The  input  variables  must  be  in  fixed  point  representation. 
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V.   Transformations  Labels 

In  TRANSFORMATIONS  there  are  several  instructions  which  perform  some 
kind  of  test  on  your  data.   In  most  cases,  the  results  of  that  test  cause 
the  program  not  to  execute  the  next  sequential  instruction,  hut  to  branch 
to  some  other  statement  in  the  TRANSFORMATIONS  program  and  continue 
executing  with  that  statement .   In  order  to  refer  to  these  statements  we 
wish  to  transfer  to,  TRANSFORMATIONS'  labels  are  employed. 

The  form  of  these  labels  can  be  illustrated  by  the  following  example 
which  skips  over  the  divide  statement  if  the  divisor  is  zero. 


"NEXT" 
"JUMP" 


IF(3)"NEXT"    "JUMP"    "NEXT" 
DIV    (2)(3)(M. 
next   statement 


or 


jF(3)  "*+i"  "*+2"  "*+l". 
DIV  (2)(3)(M. 
next  statement 

The  preceding  equivalent  examples  exhibit  the  two  types  of  TRANSFORMATIONS 
labels.   The  syntactical  rules  which  govern  the  two  types  of  labels  follow. 


Type  1 
A. 


A  type  1  label  consists  of  eight  or  less  alphanumeric  characters 
set  off  by  a  pair  of  quotes. 


B.  Alphanumeric  characters  consist  of  the  alphabet   from  A  to  Z  and 
the  numeric  digits  from  0  to  9- 

C.  Any  unique  label  may  appear  as  an  operand  or  branch  address  of  any 
number  of  TRANSFORMATIONS  subparameter  instructions. 

D.  Any  label  which  appears  as  an  operand  or  branch  address  of  an  in- 
struction must  appear  immediately  preceding  and  be  part  of  at 
least  one  and  only  one  TRANSFORMATIONS  subparameter  instruction 
in  the  present  TRANSFORMATIONS  program. 


Type  2 
A. 


A  type  2  label  consists  of  a  positional  reference  of  the  form 
*+n  set  off  by  a  pair  of  quotes. 


B.   The  symbol  *  is  pointing  to  the  statement  in  which  it  appears. 
Therefore,  *+l  would  point  to  the  next  statement,  *+2  would  skip 
one  statement,  and  *-l  would  point  to  the  preceding  statement. 
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C.  This  type  of  label  need  only  appear  as  an  operand  or  branch  address 
and  not  before  the  TRANSFORMATIONS  subparameter  instruction  which 
it  is  referencing, 

D.  Since  the  statement  to  which  you  are  going  to  branch  is   always 
indicated  relative  to  the  statement  from  which  you  are  branching, 
it  is  possible  to  point  to  an  address  which  would  lie  beyond  the 
end  of  the  program  or  before  the  beginning  of  the  program.   Need- 
less to  say,  this  would  result  in  an  error  condition. 

E.  Branching  to  "*+0"  would  create  an  infinite  loop  and  is  also  illegal 

Example 

If  you  had  a  sample  with  l8  variables  and  you  wanted  to  eliminate  all 
observations  with  missing  data,  you  could  execute  either  of  the  following 
equivalent  programs : 


TRA(C). 

REC  (1,18)  "EQ"*-0.*(99)*1* 

IF  (99)  "BAD"  "ZERO"  "MIS". 
"BAD"    ABORT. 
"ZERO"   OUTPUT (SI)  (l,l8). 
"MISS"   NOOP. 

ENDPROGRAM 


TRA(C) . 

REC(l,l8)  "EQ"*-0.*(99)*1*. 

IF  (99)  "*+i"  "*+2"  "*+3". 

ABORT . 

OUTPUT (SI)  (l,l8). 

NOOP. 

END  PROGRAM 


The  sample  is  input  from  cards.   The  second  instruction  scans  variables 
1  through  l8  and  recodes  variable  99  to  a  1  if  any  missing  data  is  found. 
In  this  case,  we  assume  that  missing  data  on  the  data  cards  has  been  coded 
as  blanks  w];aich  are  read  into  the  program  as  minus  zeroes.   The  IF  instruc- 
tion branches  to  one  of  the  three  labels  depending  if  the  value  of  variable 
99  is  negative,  zero,  or  positive.   In  this  way,  if  missing  data  was  found, 
variable  99  will  be  a  1  instead  of  a  zero  and  the  branch  will  skip  over  the 
OUTPUT  instruction  causing  the  observation  with  missing  data  to  be  deleted. 
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VI.  Subparameter  Description 

ABO 

The  ABORT  instruction  causes  immediate  termination  of  the  TRANSFORMATIO 
program  and  the  entire  SOUPAC  program.   This  instruction  is  often 
transferred  to  when  internal  tests  reveal  incorrect  data.  Example: 

ABORT. 
ABS 


The  ABSOLUTE  VALUE  instruction  takes  the  absolute  value  of  the  first 
variable  and  stores  it  into  the  second  variable.  Example: 


ABS  (3) (25). 


ADD 


The  ADD  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  first  variable  through  the  next  to  last 
variable  are  summed  and  the  result  is  stored  into  the  last  variable, 
Example  s : 

ADD  (6) (7) (23). 

ADD  (1)(3U5)(7)*3T.1^*(100). 

ANG 


I 


The  ANGLE  TO  RADIANS  instruction  converts  the  first  variable,  which 
should  be  a  measure  of  an  angle,  into  radians  and  stores  the  result 
into  the  second  variable.  Example: 


MG  (5)(17) 


A-C 


The  ARCCOSINE  instruction  takes  the  arccosine  of  the  first  variable 
and  stores  it  into  the  second  variable.   The  first  vairable  must  be 
between  minus  one  and  one  inclusive.   The  result  will  be  stored  in  radii 
Example : 


A-C  (7)(3U) 


A-S 


The  ARCSINE  instruction  takes  the  arcsine  of  the  first  variable  and 
stores  it  into  the  second  variable.   The  first  variable  must  be 
between  minus  one  and  one  inclusive.   The  result  will  be  stored  in 
radians .   Example : 

A-S  (9)(13). 
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A-T 

The  ARCTANGENT  instruction  takes  the  arctangent  of  the  first 
variable  and  stores  it  into  the  second  variable.  The  result  will  be 
stored  in  radians.   Example: 

A-T  (8) (70). 

C-G 

The  COMPUTED  GO  TO  has  from  two  to  twenty  two  parameters.   The 
first  parameter  contains  a  variable  and  the  following  parameters  contain 
labels.   The  basic  form  is 


C-G(v)"L^"  "L^"  "L3" 


n 


where  n  <  21 


The  variable  must  be  floating  point.   If  it  is  not  of  integral  value 
then  it  is  truncated^,  (all  digits  to  the  right  of  the  decimal  point  are 
dropped"! .   The  instruction  will  then  branch  to  the  label  whose  position 
in  the  list  is  equal  to  the  integral  value  of  the  variable.  Example: 


Ilpit   "t-v" 


"E'V 


C-G(7)"A"  "B" 

If  variable  7  is  equal  to  k,0   or  k,3   then  the  instruction  will  branch 

If  variable  7  is  less  than  1,  then  the  program  will  ter- 
If  it  is  over  5  the  next  instruction  will  be  executed. 


to  label  "D" 


minate . 

CON 

The  CONSTANT  instruction  contains  from  two  to  one  hundred  parameters. 
The  instruction  is  used  to  assign  constant  values  to  variables.  The 
first  parameter  indicates  either  a  variable  or  range  of  variables. 
Tlie  subsequent  parameters  contain  the  fixed  point  constant (s)  and/or 
the  floating  point  constant(s)  which  are  assigned  to  the  variable(s). 
Example  s : 

C0N(7)(99)  assigns  fixed  point  value  99  to  variable  "J, 

CON (8)*^. 3*  assigns  floating  point  value  k.3   to  variable  8, 

C0N(7>8) (99)*^»3*  is  equivalent  to  the  previous  two  together. 

The  more  complicated  structures  which  contain  ranges  and  increments  for 
the  variables  and  constants,  uses  DO-notation  which  is  explained  in  Sec.  9. 
An  example  of  the  CONSTANT  instruction  with  that  structure  will  be  included 
in  that  section. 

COS 


The  COSINE  instruction  takes  the  cosine  of  the  first  variable  and 
stores  it  into  the  second.  The  first  variable  must  be  expressed  in 
radians.  Example: 

COS  (1)(17). 
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DIF 


The  DIFFERENCE  IF  instruction  contains  five  parameters.  The  first 
two  parameters  point  to  variables  said  the  next  three  contain  labels. 
The  second  variable  is  subtracted  from  the  first.   If  the  difference  is 
negative  the  instruction  branches  to  the  first  label,  if  the  difference 
is  zero  it  branches  to  the  second  or  middle  label,  and  if  the  difference 
is  positive  it  branches  to  the  third  or  last  label.  Example: 

DIF  (3)  (5) 


"A"  "B"  "C' 


If  variable  3  minus  variable  5  is  negative  the  instruction  will  branch 
to  label 


"A", 


DIV 

The  DIVIDE  instruction  contains  from  three  to  five  parameters.   In  the 
case  of  three  parameters,  the  first  variable  is  divided  by  the  second  varial 
and  the  resiilt  is  stored  in  the  third  variable.   A  division  by  zero  will 
terminate  the  program.  Example: 

DIV  (1^2^30^. 

In  the  case  of  four  parameters,  the  division  will  take  place  as  normal  exce. 
when  the  second  variable  is  zero.   In  that  event,  the  fourth  variable  will 
be_ stored  into  the  third  variable  as  a  supplied  quotient.   Example = 

DIV   (lU2K30U31^  . 

If  the  fifth  parameter  is  added,  it  indicates  a  label  to  be  branched  to 
in  the  event  of  a  division  by  zero.   The  branch  will  take  place  after 
the  supplied  quotient  is  stored  into  the  third  variable.  Example: 

DIV  (1)  (2^30)  (31) "ZERO". 

EEC 

The  EBCDIC  instruction  converts  characters,  which  are  read  into  the 
program  by  means  of  an  Al  format  field,  into  floating  point  numbers. 
The  first  variable  contains  the  character.    The  result  after  the 
table  look  up  will  be  stored  into  the  second  variable.  Example: 

EEC  (6U89^. 

The  table  used  for  the  conversions  is  located  following  the 
subparameter  descriptions.  This  instruction  is  often  used  to  prepare 
character  codes  for  input  to  a  SOUPAC  FREQUENCY  program. 

EXC 

The  EXCHANGE  instruction  exchanges  the  contents  of  two  variables. 
Example : 


EXC  (l)(2^. 


EXI 


The  EXIT  instruction  causes  immediate  termination  of  the  TRANSFORMATION.' 
program,  but  will  continue  to  execute  the  SOUPAC  program  which  follows- 
Example : 

EXIT. 
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EXP 

The  EXPONENT  BASE  e  instruction  raises  e  to  the  power  of  the  first 
variable  and  stores  the  result  in  the  second  variable.   Example: 


EXP  (3l)(T0i+) 


FAC 


The  FACTORIAL  instruction  calculates  the  factorial  of  the  first 
variable  ani  the  result  is  stored  in  the  second  variable.   Example: 

FAC(8T)(120). 

FIX 

The  FIXED  POINT  CONVERSION  instruction  converts  the  floating  point 
variable  and  the  result  is  stored  in  the  second  variable.   Example: 


FIX  (2)(9) 


FLO 


The  FLOATING  POINT  CONVERSION  instruction  converts  the  fixed  point 
yariable  indicated  by  the  first  parameter  and  stores  the  result  into  the 
second  variable.   Example: 

FLOAT  (7)  (8). 

GO 

The  GO  TO  instruction  unconditionally  branches  to  the  label  indicated 
by  the  only  parameter.   Example: 

GO  TO  "LABEL". 

IF 

The  IF  statement  has  four  parameters.  The  first  parameter  contains  a 
variable  and  the  remaining  three  parameters  are  labels.  If  the  variable 
is  negative,  zero,  or  positive  the  instruction  will  branch  to  the  first, 
second,  or  third  label  respectively.   Example: 

IF  (33)  "L1""L2""L3". 

If  variable  33  is  positive  the  instruction  will  branch  to  label  "L3". 

INP 

By  use  of  the  input  instruction  you  can  read  in  rows  of  data  from  sources 
other  than  the  main  input  matrix.   The  first  parameter  indicates  which  unit 
the  data  will  be  input  from.   The  second  parameter  indicates  the  starting 
variable  number  where  the  row  is  to  be  placed.   Examples: 

INPUT (S3) (500). 

The  program  branches  to  the  label 
"EOF"  if  an  end  of  file  occurs. 


INPUT  (Si4)(  601  )"E0F". 
INPUT(S5)(T01)"B"(12)"(12FU.1)" 


This  form  is  for  formatted  input 
The  fourth  parameter  is  the  number  of  vari- 
ables and  the  fifth  parameter  is  a  standard 
FORTRAN  format. 
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If  S3  contains  6h   variables  then  this  input  instruction  will  read  the 
next  row  from  S3  and  store  it  in  variables  500  through  563*  Be  careful 
to  avoid  overwriting  of  existing  variables  which  you  need  and  also  of 
trying  to  read  more  or  less  rows  than  exist  on  a  particular  imit. 

LAS 

The  LAST  CARD  instruction  allows  instructions  to  be  performed  after 
the  last  row  of  data  has  been  read  in  and  processed.   The  LAST  instruction 
divides  a  program  into  regular  and  last  card  segments.   The  regular 
section,  as  is  a  TRANSFORMATIONS  program  without  the  LAST  option,  is 
executed  once  for  every  row  of  data.   After  all  the  main  input  data 
is  processed  the  last  card  segment  is  executed  once.   One  of  the  main 
uses  of  the  LAST  instruction  is  to  analyze  data  accumulated  in  variables 
1001-2000  during  the  regular  segment.   Example: 

LAST. 

Only  one  LAST  instruction  may  be  used  for  TRANSFORMATIONS  programs  and 
branching  between  regular  and  last  card  segments  is  prohibited. 

ELO 

The  LOG  BASE  e  instruction  takes  the  natural  log  of  the  first  variable 
and  stores  it  in  the  second  variable.  Example: 


ELOG  (3Hl7)- 


LOG 


The  LOG  BASE  10  instruction  takes  the  base  10  log  of  the  first 
variable  and  stores  it  into  the  second  variable.  Example: 


LOG  (l8)(3i+). 


MAX 


The  MAXIMUM  VALUE  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.  The  variable  with  the  largest  value  from  the 
first  variable  to  the  next  to  last  variable  is  stored  into  the  last 

variable.  Examples: 

MAX  (1)(2)(7). 

MAX  (1)  (7)  (8)  (11)  (15). 

MIN 


The  MINIMUM  VALUE  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  variable  with  the  smallest  value  from  the 
first  variable  to  the  next  to  last  variable  is  stored  into  the  last 
variable .  Example  s : 

MIN  (11) (13) (291. 

REN  (1)(3U^U7K8K10)(12). 
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MOD 

The  MODUMR  ARIlliMETIC  instruction  finds  the  value  of  the  first 
variable  modulxig  the  second  variable  and  stores  the  result  into  the 
third  variable.  Example: 

MOD  {l){h)(7). 
MOV 

The  MOVE  instruction  stores  a  copy  of  the  first  variable  into  the 
second  variable.   If  a  value  already  exists  in  the  second  variable  it 
will  be  ovenfritten.  Example: 


MOVE  (3U9) 


MUL 


The  MULTIPLY  instruction  has  from  three  to  one  hundred  parameters 
pointing  to  variables.   The  first  variable  through  the  next  to  last 
variable  are  m-ultiplied  together  and  the  result  is  stored  in  the 
last  variable.   Examples: 

MUL  (l)(2U20^. 

MUL  (3H^K5)(6Uio). 

NO 

The  NOOP  instruction  does  nothing.   Its  primary  use  is  when  it  is 
preceded  by  a  label  and  used  as  a  placeholder  in  the  TRANSFORMATIONS 
subprogram  to  which  many  different  instructions  branch.   It  is 
commonly  used  at  the  end  of  a  TRANSFORMATIONS  subprogram  where 
several  isolated  groups  of  instructions  all  wish  to  branch  to  the 
end.  Example : 


"END"  NOOP. 


OUT 


The  OUTPUT  TO  UNIT  instruction  is  the  only  way  the  TRANSFORMATIONS 
program  can  output  a  row  of  data.   There  are  two  or  more  parameters.   The 
first  indicates  the  unit  to  which  the  row  of  variables  shoiold  be  output. 
The  other  parameters  can  be  either  single  variables  or  ranges  of 
variables.   Examples: 

OUT  (S2)(T). 


This  will  output  onto  82  a  row  with  one  variable. 

OUT  (SU)(8,11+).    or   OUT  (SU  )  (8  )  (  9,11 )  (l2  ,1^  )  • 
This  will  output  onto  SU  a  row  with  seven  variables,  variable  eight 
through  variable  fourteen.  All  outputs  to  the  same  unit  must  have  the 
same  number  of  variables  output.   A  more  detailed  description  of  types 
of  ranges  which  are  allowed  will  appear  in  Sec.  9  on  DO  notation- 
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PER 

The  PERMUTE  instruction  permutes  the  order  of  all  or  a  subset  of 
your  variables.   It  can  have  from  two  to  one  hundred  parameters,  all 
indicating  variable  niunbers.   The  first  parameter  indicates  a  starting 
point  of  where  a  string  of  variables  should  be  placed.   The  rest  of  the 
parameters  compose  that  string  of  variables.   Each  of  the  parameters 
in  that  string  represent  either  a  single  variable  or  a  range  of  variables, 
Examples : 

PER  (lOO)(3)(lO,li+)(^,5). 

This  example  places  variable  3  in  variable  100,  variables  10  through  lU 
in  variables  101  through  105,  and  variables  U  and  5  in  variables 
106  and  107.   A  more  detailed  description  of  types  of  ranges  which 
are  allowed  will  appear  in  Sec.  9   on  DO  notation  ranges. 

PER  (2)(1,1999). 

This  example  will  not  propagate  variable  1  through  all  the  variables. 
It  will  perform  the  intended  purpose  of  raising  all  variable  numbers 
up  one . 

RAD 

The  RADIANS  TO  ANGLE  instruction  converts  the  first  variable,  which 
should  be  a  measure  of  radians,  into  an  angle  and  stores  the  result  into 
the  second  variable.   Exeunple  : 


RAD  (9)(13) 


RAI 


The  RAISE  instruction  raises  the  first  variable  to  the  power  contained  in  if 
second  variable  and  stores  the  result  into  the  third  variable. 
Example : 

RAISE  (1)(2)(7). 

REG 


The  RECODE  instruction  recodes  variables  depending  on  the  satisfaction 
of  a  set  of  conditions.   The  sequence  of  pareimeters  depends  on  the 
number  of  conditions  that  must  be  met.   The  RECODE  instruction 
introduces  a  new  set  of  terminology  which  follows: 

A.   RELATIONAL  OPERATORS 


"LT" 

less  than 

"LE" 

less  than  or 

equal  to 

"EQ" 

equal 

"ne" 

not  equal 

"GE" 

greater  than 

or  equal 

"GT" 

greater  than 

CONNECTIVES 

"AND" 

"OR" 
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C.  CONDITION  SET 

A  condition  set  consists  of  three  parameters,  the  first  and 
third  parameters  pointing  to  variables  and  the  second 
parameter  containing  a  relational  operator.   Any  condition  set  is 
either  true  or  false  depending  upon  whether  the  two  variables 
satisfy  the  conditions  of  the  relational  operator. 

D.  RECODE  SET 

A  recode  set  consists  of  two  parameters.   The  first  parameter 
indicates  a  variable.   The  second  parameter  indicates  a  variable 
or  a  floating  point  constant.   If  a  recode  set  is  executed 
the  value  of  the  second  variable  or  floating  point  constant 
is  stored  into  the  first  variable. 

The  RECODE  instruction  consists  of  from  one  to  twenty  one  condition 
sets  joined  together  in  the  case  of  more  than  one  by  connectives .   This  is 
followed  by  a  recode  set  to  be  executed  if  the  logical  product  of  the  condi- 
tion sets  is  true  and  optionally  a  second  recode  set  to  be  executed 
if  the  logical  product  is  false.   Examples: 

REC  (U)  "EQ"  *3*  {h)    *1*    {h)    *0*. 

If  variable  h   equals  3.0  then  recode  it  to  1.0,  if  not  then  recode  it 
to  Q,0. 

REC  (6)  "GE"  (10)  "AND"  (?)  "LE"  (20)  (llO)  (ill) 

If  variable  6  is  greater  than  or  equal  to  variable  10  and  variable  7  is 
less  than  or  equal  to  variable  20  then  recode  variable  110  to  the 
value  of  variable  lH  . 

BIG 

The  SIGN  instruction  places  the  sign  of  the  first  variable  on 
the  second  variable.   It  is  often  used  for  saving  signs  of  variables 
during  intermediate  calculations.   Example: 


SIGN  (l)(90l) 


SIN 


The  SINE  instruction  takes  the  sine  of  the  first  variable  and  stores 
it  into  the  second  variable.   Example: 


5IN  (1)(11) 


SKI 


The  SKIP  instruction  has  two  parameters.   The  first  parameter  is  a 
sequential  unit.   The  second  is  a  variable  or  floating  point  number. 
The  number  in  the  variable  or  the  floating  point  number  indicate  the 
number  of  rows  to  be  skipped  o n  the  specified  unit.   Example: 

SKIP  (S2)(7) 
SKIP  (S5)  *1*. 
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SOU 

The  SQUARE  ROOT  instruction  takes  the  sq_uare  root  of  the  first  variable 
and  stores  it  into  the  second  variable.   Example: 


SQU  (I3)(li+). 


SUB 


The  SUBTRACT  instruction  subtracts  the  second  variable  from  the  first 
variable  and  stores  the  result  into  the  third  variable.   Example: 

SUB  (T)(9)(10). 

SUM 

The  SUM  instruction  sums  a  string  of  consecutive  variables  starting  vith 
the  variable  indicated  by  the  first  parameter  and  ending  with  the  variable 
indicated  by  the  second  parameter  and  places  this  sum  into  the  variable  point- 
ed to  by  the  third  parameter. 

COM 

The  COMBINE  instruction  has  four  parameters.   They  must  be  a  variable,  a 
floating  point  constant,  and  two  variables  in  that  order.   The  first  variable 
is  multiplied  by  the  floating  point  constant  and  the  variable  in  the  third 
parameter  is  added  to  the  product.   The  result  is  placed  in  the  variable 
pointed  to  by  the  fourth  parameter. 


SUP 


The  SUPPRESS  instruction  stops  the  printing  of  all  warning  messages. 


WAR 


The  WARNING  instruction  causes  all  warning  messages  to  be  printed.   This 
is  the  normal  condition  unless  a  SUPPRESS  instruction  is  used. 


XAD  -  fixed  point  addition 

XDI  -  fixed  point  divide 

XIF  -  fixed  point  if 

XMU  -  fixed  point  multiply 

XSM  -  fixed  point  sum 

XSU  -  fixed  point  subtract 


XAD  (1)(2)(20). 

XDI  (3)(U)(21). 

XIF  (5)"T""0""P". 

XMU  (6)(T)(22). 

XSM  (8)(15)(23). 

XSU  (16)(1T)(2U). 


The  preceding  fixed  point  instructions  have  the  same  parameters  as  the 
corresponding  floating  point  instructions,  except  that  the  arithmetic  in- 
structions are  restricted  to  two  input  variables  and  division  by  zero  in  the 
XDIVIDE  instruction  will  terminate  the  program.   The  variables  used  in  the 
fixed  point  instructions  must  be  in  fixed  point  representation  rather  than 
the  normal  floating  point  representation. 

ZAP 

The  ZAP  instruction  zeros  out  variables  1001  through  2000. 
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VIT.   EBCDIC  Conversion  Table 


Character 

Floating  Point   Niomber 

Character 

Floating  Point   Nimber 

blank 

-0. 

0 

Ih. 

0 

0. 

, 

75. 

1 

1. 

< 

76. 

2 

2. 

( 

77. 

3 

3. 

+ 

78. 

1+ 

h. 

79. 

5 

5. 

& 

80. 

6 

6. 

I 

90. 

7 

1. 

$ 

91. 

8 

8. 

it 

92. 

9 

9. 

) 

93. 

A 

10. 

5 

9h. 

B 

11. 

-r 

95. 

C 

12. 

96. 

D 

13. 

/ 

97. 

E 

lU. 

» 

107. 

F 

15. 

^ 

108. 

G 

16. 

109 

H 

IT. 

> 

110. 

I 

18. 

9 

111. 

J 

19. 

* 

122. 

K 

20. 

ff 

123. 

L 

21. 

& 

12U. 

M 

22. 

1 

125. 

N 

23. 

= 

126. 

0 

2U. 

tr 

127. 

P 

25. 

Q 

26. 

R 

27. 

S 

28. 

T 

29. 

U 

30. 

V 

31. 

w 

32. 

X 

33. 

Y 

3U. 

Z 

35. 
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VIII.  DO-Notation 

DO-notation  is  a  facility  provided  in  the  TRANSFORMATION  program  to  enable 
a  user  to  easily  and  compactly  perform  an  operation  on  a  set  of  variables 
instead  of  performing  that  operation  on  each  of  the  variables  individually. 
This  concept  of  DO-notation  corresponds  to  the  concept  of  FORTRAN  DO-loops. 
The  following  form  of  DO-notation  would  be  used  in  a  parameter  which 
points  to  a  variable. 


(V 


1'  Vg. 


I) 


V^  =  the  initial  variable  of  the  set 

V2  =  the  criterion  variable  for  termination  of  the  set. 

I  =  the  increment 

Examples: 

(1,  5,  1)  points  to  variables  1,  2,  3,  ^,  and  5- 

(1,  5>  2)  points  to  variables  1,  3,  and  5- 

(U,  lU,3)  points  to  variables  U,  7,  10,  and  13. 

If  the  increment  is  not  specified  it  is  assumed  to  be  1,  Example: 

(2,  10,  1)  is  equivalent  to  (2,  10). 

A.   The  major  use  of  DO-notation  is  to  indicate  repetition  of  an  instruction 
on  different  sets  of  variables.   Examples: 

ADD(l,5,2)(6,8)(l2,lU)   is  equivalent  to  ADD(l)(6)(l2) . 

ADD(3)(T)(13). 
ADD(5)(8)(ll|). 

MUL(1,U)(100)(101,107,2)   is  equivalent  to  MUL(l) (lOO) (lOl) . 

MUL(2)(100)(103). 
MUL(3)(100)(105). 
MUL(U)(100)(10T). 

If,  in  an  instruction  containing  several  parameters  using  DO 
notation,  the  sets  of  variables  are  unequal  in  length,  the  instruction 
will  cycle  until  the  longest  set  of  variables  has  been  satisfied. 
The  variables  used  in  the  shorter  sets  after  they  have  been  exhausted 
will  be  the  last  variables  of  that  set.   Example: 

MAX(1,T,3)(1^,15)(13,15,2).  is  equivalent  to  MAX(l)  (l)4)(l3)  • 

MAX(U)(15)(15). 
MAX(T)(15)(15). 


The  secondary  use  of  DO-notation,  called  DO-notation  ranges,  is  ur.ed 
with  the  constant,  output,  and  permute  instructions.   Instead  of 
indicating  a  repetition  of  the  instruction  on  each  variable  in  the 
set,  the  instruction  is  executed  once.   The  parameter  in  which  the 
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DO-notation  range  occurs  .low  points  to  a  string  of  variables. 
Examples : 

0UTPUT(S1)(2,T) .    outputs  the  string  of  variables 

2,3,U,5,6  and  ?• 

0UTPUT(S2) (3,7,2).   outputs  the  string  of  variables 

3,5,  and  T- 

PERMUTE  (l00)(3)(l0,15)(20,2l+,2). 

has  two  parameters  of  DO-notation 

ranges  in  one  instruction.   The  instruction 

places  into  variables  100  through  109,  the 

following  variables:  3,  10,11,12,13,1^,15, 

20,22,21+ 

Looking  at  the  CONSTANT  instruction,  we  see  DO-notation  ranges  used 
to  indicate  strings  of  fixed  and  floating  point  constants  as  well 
as  strings  of  variables. 

The  first  parameter  of  CONSTANT  can  point  to  a  variable  or  a  string 
of  variables.   The  later  is  the  only  case  which  involves  DO-notation 
ranges,  so  the  discussion  will  be  confined  to  that  case. 


C0N(2,U)(1)(2)(3) 


places  the  fixed  point  constants  one,  two 
and  three,  into  the  string  of  variables 
2,3,U 


By  using  DO-notation  ranges  with  fixed  point  constants  the  equivalent 
instruction  would  be 

C0N(2,U)(1,3). 

The  next  two  examples  are  also  equivalent.   They  both  assign  to 
variables  five  through  eight  the  floating  point  constants 
2.5,5-0,7.5,  and  10.0. 

C0N( 5 ,8 )*2 . 5**5 . 0**7 . 5**10 . 0* . 

CON(5,8)*2.5,10.0,2.5*. 

The  initial  values  of  the  string  of 
floating  point  constants  is  2.5-   The 
termination  criterion  is  10.0.   The 
increment  is  2.5 

All  three  types  of  DO-notation  ranges  can  be  used  together  in  tho 
CONSTANT  instruction.  The  following  example  combines  both  of  the 
preceding  sets  of  examples  into  one  instruction. 

CON(2,8)(1,3)*2.5,10.0,2.5*. 

Note :   If,  when  using  DO-notation  ranges  with  the  CONSTANT  instruction, 
the  string  of  variables  is  unequal  in  length  to  the  total  number  of 
fixed  and/or  floating  point  constants  indicated,  the  string  of  constants 
is  truncated  if  it  is  longer  and  if  it  is  shorter  the  last  constant 
is  assigned  to  the  remainder  of  the  variables. 
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K.   Flag  Notation 

Flag  notation  is  TRANSFORMATION'S  version  of  indirect  addressing.   Instead 
of  a  parameter  pointing  directly  to  a  variable,  it  points  indirectly  to  a 
variable  through  another  variable.   The  parameter  points  to  a  variable  which 
in  turn  points  to  another  variable.   This  feature  enables  an  instruction  to 
point  to  different  sets  of  variables  depending  upon  the  values  assigned  to 
the  intermediate  variables. 

The  main  type  of  flag  notation  is  called  F-flag  notation.   F-flag  notation 
is  indicated  by  inserting  an  F  directly  after  the  variable  nixmber.   Example: 

ADD(TF)(8f)(9F).    Restriction:   The  values  in  the 

intermediate  variables  of  flag-notation 
(variables  7,8,  and  9  in  this  example) 
must  be  in  fixed  point  representation  and 
must  point  to  a  valid  variable  niunber. 

Let  the  notation  Vn  indicate  variable  n.   Exeunple :  V7  indicates  variable  7- 


If 


V7  =  100 
V8  =  150 
V9  =  180 


then  the  preceding  example  would  generate  after  the  indirect  addressing 
takes  place: 


ADD(100)(150)(180). 

F-flag  notation  can  also  be  used  with  DO-notation. 

If     VIOO  =  2    VllO  =  6 
VlOl  =  13   Vlll  =  16 
V102  =  5    V112  =  9 


Example 


V120  =  10 
V122  =  19 
V12U  =  13 


then  MUL(100F,102F)(110F,112F)(120F,12Uf,2).  would  generate  after  indirect 
addressing. 

MUL(2)(6)(10). 

MUL(13)(16)(19). 

MUL(5)(9)(13). 

Note  that  the  limits  of  the  DO-notation  are  extracted  form  the  intermediate 
variables  and  not  from  the  final  variables. 

The  other  type  of  flag  notation  is  called  D-Flag  notation.   Its  only 
use  is  with  DO-notation.   The  difference  between  it  and  F-flag  notation  is 
that  the  limits  of  the  DO-notation  with  D-flag  notation  are  derived  from  the 
final  variables  after  indirect  addressing  takes  place.   Example: 


Using  the  same  variable  values  as  above. 

MUL(100D,102D)(110D,112D)(120D,12Ud) 

would  generate  after  indirect  addressing: 

MUL(2,5)(6,9)(10,13). 

or 

MUL(2)(6)(10). 
MUL(3)(7)(11). 
MUL(U)(8)(12). 
MUL(5)(9)(13). 


fc:v>v.'.v/ 
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X.   A  TRANSFORMATIONS  Example 

Let  us  consider  a  set  of  data  consisting  of  ten  variables.   Variable  one 
contains  a  zero  for  females  and  a  one  for  males.   Variables  two  through  five 
and  eight  through  ten  all  range  from  zero  to  ninety-nine  and  are  to  be  collapsed 
into  the  numbers  one  to  four  representing  quartiles.   Variables  six  and  seven 
are  to  be  recoded  into  a  dichotomous  variable  depending  if  the  value  is  five  or 
not  five.   The  output  is  then  split  into  males  and  females  suitable  for  input 
into  two  separate  FREQUENCY  programs.   Also  desired  are  averages  from  variables 
two  through  ten  before  recoding  takes  place.   Totals  are  kept  during  the  regular 
segment  and  then  in  the  last  card  segment  these  are  divided  by  the  sample  size 
and  printed. 


TRANSFORMATIONS ( C ) . 

PERMUTE  (6)(8,10)(6,7). 

ADD  (1001)*1*(1001). 

ADD  (1002, 1010)(2,10)(1002, 1010). 

RECODE  (2,8)"LT"*25*(2,8)*1*. 

RECODE  (2,8)"GE"*75*(2,8)*1+*. 

RECODE  (2,8) "GE"*50« ( 2 ,8 )*3* ( 2 ,8 )*2* . 

RECODE  (9,10) "EQ"*5*(9,10 )*!*( 9 ,10)«0* 

IF  (l)"BAD""FEM""MALE". 
"BAD"   ABORT. 
"FH4"   0UTPUT(S2)(2,10). 

GO  TO  "END". 
"MLE"  OUTPUT  (S3)  (2,10). 
"END"   NOOP. 

LAST. 

DIVIDE  (1002,1010) (1001) (1002,1010). 

OUTPUT  (P) (1001,1010). 

END  PROGRAM 


inputs  one  row  from  cards 

reorders  the  variables 

increments  row  number  by  one 

adds  respective  values  to  row  totals 

recodes  first  quartile  values 

recodes  fourth  quartile  values 

recodes  third  &  second  quartile  values    1 

creates  dichotomous  variables 

branches  based  on  male  or  female 

aborts  program  due  to  bad  data 

outputs  female  data 

branches  around  male  output 

outputs  male  data  : 

this  instruction  is  only  a  placeholder 

indicates  beginning  of  last  card  segment 

calculates  averages  ^: 

prints  sample  size  and  averages 


The  regular  segment  is  executed  once  for  every  row  of  card  input  data.   Here 
the  values  are  recoded  and  output,  while  also  row  totals  are  kept  in  variables 
greater  than  1000.   The  last  card  segment  is  executed  once  to  calculate  and  print 
the  averages. 
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XI.   Notes  and  Ideas 


1.   Missing  data  of  the  form  -0.0  can  be  differentiated  from  0.0  only  in 
the  recode  statement,  so  if  this  distinction  must  be  made,  -0.0  should  first 
be  receded  to  another  value  before  testing. 


2.  The  valid  outputs  in  the  OUTPUT  instruction  are  PRINT  and/or 
Sn,  n  ^  15. 

3.  When  collapsing  data  be  careful  not  to  overlap  your  recoding  and  in- 
advertently recode  values  twice  or  more. 

h.      For  those  not  familiar  with  the  terminology  in  the  RECODE  instruction 
more  exaunples  appear  below. 


Given : 


VI  =  10 
V2  =  23 


V3  =  13 
Vh   =   89 


V5  =  23 
V6  =  -T 


The  following  condition  sets  have  the  respective  truth  values. 

(1)  "GT"  (2)  is  false 
(3)  "LE"  (U)  is  true 

(2)  "EQ"  (5)  is  true 
(6)  "GE"  *0*  is  false 
(2)  "NE"  (5)  is  false 
(6)  "LT"  (U)  is  true 

If  two  or  more  condition  sets  are  joined  by  "AND",  they  must  all  be  true  for 
the  logical  product  to  be  true.   If  two  or  more  condition  sets  are  joined 
by  "or",  then  the  logical  product  is  true  if  any  of  the  condition  sets  are 
true.   If  the  connectives  are  mixed,  then  the  "AND"  connective  is  of  higher 
precedence  than  the  other  connectives  in  the  same  way  that  multiplication  is 
of  higher  precedence  than  addition  in  ordinary  arithmetic. 

5.   The  A-C,  A-S,  ELOG,  LOG  and  SQU  instructions  have  optionally  avail- 
able third  and  fourth  parameters,  a  variable  and  a  branch  address.   In  the 
case  where  the  output  is  not  defined  or  the  input  variable  is  invalid,  the 
third  parameter  is  substituted  for  the  output  value  and  branches  to  the 
branch  address.   If  the  branch  address  is  not  specified,  processing  continues 
with  the  next  instruction. 


Examples : 


SQU(l)(l8)(200). 
SQU(l)(l8)(200)"ERR". 


6.   Do  not  use  F  flag  notation  with  the  CONSTANT,  PERMUTE,  or  OUTPUT 
instructions . 


7.  If  you  are  inputting  from  a  double  precision  matrix  and  have  over 
500  variables,  beware!  Change  the  input  matrix  to  single  precision  before 
TRANSFORMATIONS  or  see  Bill  Walter  in  the  SOUPAC  Office. 


BASIC  POPULATION  STATISTICS  PACKAGE 
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BASIC  POPULATION  STATISTICS 

In  this  section  are  the  programs  commonly  used  to  provide  the  user 
with  a  "first  look"  at  his  data.   It  is  neither  expected,  nor  is  it  a 
good  idea,  that  the  researcher  be  naive  about  his  data  (it  is,  of  course, 
assumed  that  any  hypothesis  to  be  tested  was  made  previous  to  collecting  . 
the  data).   Tabulations,  or  "frequency  counts,"  cross  tabulations,  sample 
means,  rank  scores,  etc.  are  all  useful  "summary  statistics"  that  may 
indicate  warning  signals  concerning  assumptions  made  by  the  experimenter 
that  might  be  questioned. 

In  addition,  the  statistics  required  for  many  basic  techniques  will 
be  found  in  this  section.   These  include,  for  instance,  the  sample  mean, 

and  other  statistics  derived  from  the  moments  of  a  sample,  rank  order 

2 
statistics,  the  non-parametric  Mann  Whitney  U  statistic,  and  the  x   statis- 
tic for  testing  independence  of  variables  (appropriate  for  considering  the 
hypothesis  of  independence  of  two  or  more  variables  from  the  same  sample). 
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FREQUENCY  COUNTING  AND 
MEASURES  OF  ASSOCIATION 


I.   General  Description 

This  program  computes  tables  of  the  frequency  of  occurrence  of 
values  that  input  variables  take,  and  where  appropriate,  measures  of 
association  may  be  computed.   Input  to  the  program  may  be  in  the  form  of 
previously  computed  tables  (on  which  measures  of  association  will  be 
computed)  or  may  be  in  the  form  of  raw  data.   Only  integer  numbers 
may  be  counted;  decimal  point  data  will  be  rounded.   Negative  values 
are  allowed. 

A.   FREQUENCY  COUNTING 

The  following  options  are  available : 

1,  Either  one-dimensional  or  two-dimensional  tables  may  be 
specified.  For  one-dimensional  counts,  the  frequency  of 
occurrence  for  each  value  of  the  variable  is  listed.  For 
two-dimensional  coionts,  each  value  of  the  second  variable 
is  counted  separately  for  each  value  of  the  first  variable. 

2.  Control  variables  may  be  used  which  enable  counting  to  be  done  in 
up  to  12  dimensions.   If  control  variables  are  specified,  data 
must  be  presorted  on  these  variables.  When  the  value  of  any 
variable  designated  a  control  variable  changes  from  one  row 

to  the  next,  counting  is  stopped  and  a  new  table  is  started. 
Thus  counting  proceeds  as  long  as  the  values  of  all  control 
variables  remain  constant. 


3t   The  minimimi  and  maximum  values  to  be  attained  may  be  specified 
separately  for  each  variable  to  be  counted  (or,  optionally, 
not  specified  at  all'^ .   If  either  the  maximum  or  minim-um  values 
are  not  specified,  they  will  be  determined  from  the  data  using 
an  extra  read  of  the  data.  Values  which  fall  below  the 
minimum  or  above  the  maximum  are  ignored.   This  capability  adds 
flexibility  to  the  program  and  may  be  an  appreciable  cost  saver. 
Its  misuse  by  gross  estimates  of  minimum  and  maximum  values 
can,  however,  be  costly. 

h.     For  each  cell  in  a  one-dimensional  table,  the  percentage,  if 
requested,  of  the  total  sample  that  were  coimted  there  will  be 
printed.  For  two  dimensional  tables,  the  percentage  of  the  row 
and  column  may  also  be  requested. 

5.  A  weighting  variable  may  be  specified.  Without  a  weighting 

variable,  frequency  counts  are  advanced  by  one  for  each  occurrence 
of  a  value.  VJhen  a  weighting  variable  is  used,  the  frequency 
coimts  are  advanced  by  the  value  of  the  weighting  variable  for 
the  row.  Thus  some  rows  of  data  may  be  given  more  importance 
than  others. 
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Labels  can  be  given  for  variables  so  that  output  is  more 
readable.  Each  label  is  restricted  to  eight  characters  or 
less. 

Input  may  be  from  previously  computed  tv;-o-dimensional  tables, 
from  which  measures  of  association  can  be  directly  computed. 


MEASUEES  OF  AESO'IATION 

Tl-ie  following  coefficients  are  calculated  and  printed  on  option  for 
two-way  tables: 


1.   Chi-square  and  related  coefficients 

Let:  n  =  total  population  of  the  table 

n  ,  =  number  of  Vertical  classification  a  (column  a) 


ab 


and  Horizontal  classification  b  (row  b) 


b^ 


n  ^   -^-^ab 
•  b   a 


O.  -   number  of  rows 
3  =   number  of   columns 


Then:   ch3 -square 


ab 


/H  ,  -  n   n  ,  V 
(  ab    a.   .b) 


^a.  -'\b/n 


adjusted  chi-square  (Yate's  correction  Tor  continuity"^  for  2x2 

tables  only  =■ 


ZZ  /n   -  n  n,  -  1/^ 

,  (  ab    a   r: 
ab  -^ ■ 

a  D 


r  -    chi-square^/n  -,' 
1  +  chi-square/n 


/  eh i - s nuare /n  ^  / 


C  and  T  are  measures  of  contingency  and  can  be  looked  up  in  con- 
tingency tables.   The  maximum  expected  frequency  is  also  printed. 


«gidfeSfel^ 
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Lambda  coefficients 

Let:  n   -  Max  n 

am    ,    ab 
b 


n  ,  =  Max  n 
mb       ab 
a 


n   =  Max  n  , 
.  ,m    ,    .  b 
b 


n   -  Max  n 
m.        a. 
a 


Lambda  ~ 


a,  "-am  "*"  ^  '^mb  "  ^.m 

2n  -  n   -  n 

•  m    m. 


-  rir, 


Lambda  H 


Lambda  V  = 


n  -  n 


m. 


]3  %b  ~  %. 


n  -  n- 


•m. 


Lambda  coefficients  will  be  indeterminate  if  all  values  lie  in 
one  column  or  row. 

Lambda  H  can  be  defined  as  the  decrease  in  probability  of  error 
in  predicting  the  H- variable  when  knowledge  of  the  value  of  the 
V-variable  is  considered  as  opposed  to  random  guessing  of  the 
H-variable. 

Nii^ety-rfive  per  cent  confidence  limits  are  calculated  and  printed 
for  Lambda  H  and  Lambda  V  using  the  methods  discussed  by  Goodman 
and  Kruskal  in  their  second  article.   (See  references).   Lambda 
is  always  between  Lambda  H  and  Lambda  V. 


Weighted  Lambda  Coefficients 

n 


Weighted  Lambda  H 


a-     a 


am       ,.        ^     ab 
-  Max  Z  

b     b     a. 


n 


a  - 


Max  Z  _ab 
b     an 


Weighted  L/jmbda  V 


^  mb   ,.   ^  ab 
Z Max  Z  ' 

b   .b    a  b   .b 


P  - 


Max  Z   ^ab 
a 


b  n 


b 


These  are  Lamibda  H  and  Lambda  V  calculated  using  weighted  quantities 


ab 


1/a  -^  and  1/3 


ab 


respectively,  instead  of  n 


ab* 
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No  confidence  limits  are  provided  for  t?ie  weighted  lambda  coef- 
ficients. 


Gamma  Coefficient 


Let:   PS 


ab 


ZZ 


Z  Z 

a'   b' 
nab  [a'>a  b>b  n„.>-,] 


Z 
a' 


Z 
b' 


PD  -^  ;;  "ab  ta'>a  b'>b  >^a'b'^ 


Then  Gamma  = 


PS  -  PD 
PS  +  PD 


Ninety-five  per  cent  confidence  limits  for  Gamma  are  calculated 
using  the  method  outlined  and  preferred  in  the  second  article  by 
Goodman  and  Kruskal. 


References 


These  coefficients  are  discussed  and  compared  by  Leo  A.  Goodinan  and 
VJilliam  H.  Kruskal  in  their  article  "Measures  of  Association  for 
Cross  Classification",  American  Statistical  Association  Journal, 
December,,  19'?^. 

The  Gamma  coefficient  is  their  suggested  measure. 

The  C  coefficient  was  first  suggested  by  Karl  Pearson  and  the  T 
coefficient  is  due  to  Tchuprow. 

The  Lambda  coefficients  apparently  were  first  suggested  by  Louis 
Guttman  ("The  Predication  of  Personal  Adjustment",  Bulletin  k8, 
Social  Science  Research  Council,  New  York,  19^1' • 

The  developroent  of  the  approximate  sampling  theory  and  of  the 
machinery  for  calculating  the  confidence  intervals  for  Lambda  and 
Gamma  was  done  in  a  sequel  article  by  Goodman  and  Kruskal:  "Measures 
of  Association  for  Cross  Classification  III;  Approximate  Sampling 
Theory",  American  Statistical  Association  Journal,  June,  1963- 

The  statistics  that  are  requested  will  be  printed  immediately 
following  each  table. 

T .   Restrictions 

The  program  is  limited  currently  to  U50  input  variables  and  1000 
tables.   Tables  are  restricted  to  a  maximum  of  80,000  cells, 
each  of  which  can  hold  a  maximum  count  of  32,767-   As  many  tables  as 
will  fit  into  work  storage  (80,000)  will  be  computed  in  each  read  of 
the  data.   If  tables  will  fit  into  80,000  cells,  card  input  is  allowed. 
If  maxima  and  minim.a  are  not  specified  for  card  input,  data  will  be 
transferred  to  disk  during  preread  of  data. 
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Parameters 

A.  Main  Parameter  Card 

Immediately  following  the  program  name  FREQ.UENCY  (mnemonic:   FRE^i,  the 
following  parameters  are  listed,  each  enclosed  in  parentheses  with  a 
period  after  the  last  parameter  used : 


Parameter 
Number 

1 

2 

3 
k 


Use  or  Meaning 
Input  Address. 

0  -  ignore  blanks 

1  -  count  blanks  separately 

2  -  count  blanks  as  zeroes 


Spacing 


0  -  normal  spacing 


1  -  one  table  per  page 
Address  of  labels. 


Variable  number  of  weight  variable. 

0 
Type  of  input 


-  raw  data 

-  where  n  is  the  number  of 
previously  computed  tables. 
If  n  >  1,  then  input  must  be 
from  cards,  eind  each  table  is 
a  separate  data  deck. 

If  both  parameters  1  and  k   are  cards,  the  labels  must  precede  data. 

B.   Subparameters 

Subparameters  follow  the  main  parameter  card  and  can  be  in  any  order. 
A  period  must  follow  each  subparameter  statement  though  the  statement 
can  be  continued  on  more  than  one  card.   If  the  subparameter  statement 
is  left  out,  the  option  is  not  used.   In  the  following  explanation 
I  -  integer  and  F  =  real  number. 


Mnemonic 


PER(I)(I)(I) 


MIN*F*^-F* . , 
MAX^F^^F* . . 


Use  or  Meaning 


Per  cents  are  requested 


0  -  no 

1  -  yes 


l^t  integer  =  total  per  cent 
2^^  integer  =  row  per  cent 
S-""  integer  =  column  per  cent 

Minimum  and  maximum  are  given.   The 
last  value  is  propagated  to  any 
remaining  variables.   Data  will  be 
reread  if  either  MIN  or  MAX  is  missing 
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IV 


Mnemonic 

MEA(I)(I)(I)(I). 
Only  applicable  to 
2-way  tables 


Use  or  Meaning 

Measures  are  requested 
.2 


0  -  no 

1  -  yes 


1^^  integer  =  X^  (with  a  code  of 
2  both  X  and  a  table  of  expected 
frequencies  will  be  printed) 

2^^  integer  =  \   (lambda^ 

3,,  integer  -   weighted  X 


k^     integer  =  7  (g 


amma 


CONTROL(lUl^ 

ONE(l,I,lUl,I,l) 
TW0(I,I,I)(I,I,I^ 


Up  to  10  control  variables  are  allowed. 
The  I '  s  should  be  the  variable  niimbers 
of  the  control  variables. 

One  and  only  one  of  these  two  must  be 
...       in  every  program.   ONE  means  one-way 
tables.   WO  means  two-way  tables. 
In  ONE,  (1,1,1)  specifies  one  range  of  1 
tables.   In  TWO,  (l,I,  l)  (l,  I,  l)  specif:i| 
one  range  of  2-dimensional  tables. 

The  notation  (l,I,l'>  has  the  following  meaning:   If  it  is  absent  com- 
pletely, i.e.,  ONE.  or  TWO.  then  all  possible  tables  are  calc\ilated. 
The  first  integer  is  the  initial  value,  the  second  is  the  terminal 
value  and  the  third  is  the  increment.   It  means:   take  all  values 
starting  at  the  first  integer  and  stepping  by  the  third  integer  until 
you  reach  the  second  integer.   If  the  third  integer  is  missing,  the 
increment  is  taken  to  be  one.   If  the  second  is  also  missing,  then 
the  first  is  taken  as  a  single  table  specification.   As  many  as 
wanted  can  be  specified  subject  to  the  -f^ollowing  restrictions:   In 
the  two-way  tables,  no  more  than  5^0  separate  ranges,  i.e.,  (l,I,lHl;J 
can  be  specified. 

Labels 

Labels  can  come  from  cards  or  temporary  storage.  Each  label  should 
be  treated  as  if  it  were  two  variables  each  k   characters  long.   For 
example,  if  there  are  6  variables  in  the  input  data,  then  there  would  be 
twelve  variables  for  labels  and  the  data  card  would  be    DATA(l2) (12AU) . 

All  the  labels  are  treated  as  one  row  of  input  n  variables  long. 
Labels  need  not  be  given  for  each  variable  but  if  a  variable  is  skipped 
and  more  labels  follow,  then  it  should  be  replaced  with  eight  blanks. 

V.   Examples 


1.   FRE(C). 
PER(l) . 

C0NTR0L(1)(3). 
0NE(2,6,2)(9). 
END  P 

Input  is  from  cards;  per  cent  of  totals  will  be  printed;  control  variables 

are  1  and  3;  resulting  tables  are  2,  U,  6,  and  9*   Blanks  will  be  ignored 
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2.  FRE(C) (2) (1) (C)(2). 
PER(1)(1)(1). 
TWO(3)(U)(2,6,2)(l,5,2). 
END  P 

All  per  cents  will  be  given;  2  will  be  the  weighting  variable;  resulting 
tables  will  be  3  vs  U;  2  vs  1;  2  vs  3;  2  vs  5;  U  vs  1 ;  U  vs  3;  U  vs  5 ;  6  vs  1; 
6  vs  3;  6  vs  5»   Tables  will  be  printed  one  per  page  and  blanks  will  be 
counted  as  zeroes. 

Since  both  labels  and  data  are  on  cards  the  deck  will  look  like  this: 
FRE(C) 

• 

END  S 
DATA(n)(nA4) 

label  for  first  labeled  variable label  for  last  labeled  variable 

END# 

DATA(n/2)( ) 

• 

END# 

3.  FRE(S1)(1). 
TWO. 
MEA(2)(1)(1)(1). 

END  P 

All  possible  two-way  tables  will  be  calculated;  all  four  measures  will  be 
calculated  and  the  table  of  expected  frequencies  will  be  printed.  Blanks 
will  be  counted  separately. 

k.      FRE(C)()()()()(2).  ■ 

MEA(1)()(1). 

END  P 

p 
Input  is  m  the  form  of  two  previously  computed  tables.  X  and  weighted  X 

will  be  calculated. 

Since  there  are  two  tables  the  deck  will  look  like  this: 
FRE(C) 

• 

END  S 

DATA 

• 

END# 

DATA 

END# 
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VI.   Output  Examples 

A.   ONE-DIMENSIONAL  TABLE 

A  one-dimensional  frequency  table  might  "be  output  as  follows : 


VALUE 
FREQ 


h  5 

25    30 


TOTAL 
60 


This  table  indicates  that  the  value  1  occiirred  3  times,  that  the 
value  h   occurred  25  times,  and  so  on,  for  a  total  of  60. 

B.   TWO-DIMENSIONAL  TABLE 

A  two-dimensional  frequency  table  might  look  like  this: 

VARIABLE  1  ACROSS 
VARIABLE  2  DOWN 


VALUE 
2 
5 

T 


SAMPLE  SIZE  =  23 


This  table  would  indicate  that  simultaneous  observations  of  1  for 
variable  1  and  2  for  variable  2  occurred  once.   A  value  of  3  for 
variable  1  at  the  same  time  as  a  value  of  5  for  variable  2  occurs 
8  times.   The  number  of  observations  in  the  sample  was  23. 


RANK  ORDERING  PROGRAM 


GeneraJ.  Description 

A.  Purpose 

The  RANK  ORDERING  program  receives  as  input  raw  data  matrix  and 
produces  as  output  a  matrix  in  which  each  element  has  been  replaced 
by  a  number  denoting  the  rank  of  the  element  WITHIN  ITS  COLUMN.   In 
other  words,  each  column  of  the  input  matrix  is  considered  a  separate 
variate  and  will  be  converted  to  a  corresponding  ranking. 

The  smallest  variate-value  is  assigned  rank  1.0,  the  next  largest 
a  rank  2.0,  etc.,  until  the  largest  variate-value  is  assigned  the 
highest  rank.   In  the  case  of  tied  values,  identical  ranks  are  assigned 
to  equal  values ,  the  rank -number  being  set  equal  to  the  average  of  the 
rank  which  would  occur  if  the  tied  values  were  distinguishable.   This 
is  sometimes  known  as  "mid-rank  method". 

B.  References 

Kendall,  Maurice  G.,  Rank  Correlation  Methods,  Charles  Griffin  and 
Co.,  Ltd.   London,  I9I+8. 

Restrictions 

A.  Input 

The  input  data  to  this  program  may  come  from  any  source.   If  cards 
are  used  as  input,  the  number  of  rows  in  the  input  matrix  must  be 
specified  on  the  data  format  card  and  the  total  number  of  elements 
in  the  matrix  may  not  exceed  30,000.   The  maximum  number  of  rows 
for  any  matrix  input  to  this  program  is  30,000,  and  the  maximum 
number  of  columns  for  any  matrix  input  to  this  program  is  U50. 

B .  Output 

If  an  input  matrix  contains  more  than  30,000  elements,  an  automatic 
partitioning  of  the  input  data  occurs  such  that  each  partition  contains 
the  maximum  number  of  complete  columns  possible  within  the  constraint 
that  no  one  partition  may  contain  more  than  30,000  elements. 

The  results  of  the  ranking  of  each  partition  are  output  separately, 
one  partition  per  output  address  specified  as  a  parameter  on  the  pro- 
gram parameter  card.   A  maximum  of  twenty-one  such  output  address  are 
allowed . 


CAUTION:   If  partitioning  is  anticipated,  the  user  should  specify  one  output 
address  for  each  partition  anticipated.   This  warning  applies  especially  in 
the  case  where  printed  or  punched  output  occurs.   Printing  and  punching  will 
occur  only  for  the  partitions  for  which  it  is  specified. 
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The  exception  is  for  partitions  over  the  twenty-first  one.  For  partitions 
beyond  the  twenty-first^  printing  and  punching  is  done  if  it  was  specified 
for  the  twenty-first  partition.  However,  no  partitions  beyond  the  twenty- 
first  one  may  be  stored  on  a  peripheral  device  (SEQUENTIAL  address). 

C.   Data 

Since  all  comparisons  in  this  program  are  done  in  single  word  leng- 
operands,  in  some  cases  the  program  may  not  be  able  to  successfully 
differentiate  between  two  values  which  agree  through  the  first  five 
significant  digits  and  differ  in  subsequent  digits. 


III.   Parameters 


The  parameters  for  the  RANK  ORDERING  program  must  follow  the  program 
name  on  the  progreun  call  card  in  the  order  given  below: 


Parameter 

Number 


Use  or  Meaning 


1  Input  Address.* 

2-23  Output  Address. 

IV.   Special  Comments 

If  RANK  ORDER  correlation  coefficient  P  (Spearman's  rho)  is  desired, 
the  rankings  should  be  input  to  the  CORRELATION  program  (see  individual 
program  description)  and  the  Product  Moment  Correlation  coefficient 

obtained. 


*If  CARDS  are  used  the  DATA  card  must  contain  the  number  of  rows  as  well 
as  the  number  of  columns  in  the  input  matrix  (See  User's  Guide  for  deta: 


STANDARD  SCORES 


Note:   Sample  Size,  Mean,  Standard  Deviation,  Variance,  Skevness  and  Kurtosis 
are  referred  to  as  "Basic  Statistics"  in  this  write-up. 


I.   General  Description 

This  program  is  used  to  calculate  the  following; 


Mean :   X .  = 
J 


N 


N   p 
N  Z  XT.  -  (  EX. 


Variance:   V.  = 
J 


i=l  ^J 


i=l 


ij 


N(d.f .) 


where  degrees  of  freedom: 


d.f.  =   N     see  parameter  h   for 
or    explanation 
N-1 


Standard  Deviation:   S.  =  /  V. 
J      J 


Skewness 


No_       Np_       N 

E  X . . -X . ( 3 . 0*  Z  XT . -X . ( 3 . 0*  E  X . . -N*X . ) ) 


i=l  ^J   J     i=i  ^J   J 


i=l  ^J 


NS: 


J 


Kurtosis:    i=l  ^'^      -^ 


E  x:.-X.(U.O*  E  X.-X.(6.0*  E  XT.-X.(U.O*  E  X. .-N*X.))) 


i=l 


J   J 


i=l  ^J   J 


i=l  ^J 


NS 


-  3.0 


Standardized  Scores:   Z, 


X.  .  -  X. 


ij 


Standardized  Scores  about  Mean  =  A,  and  Std.  Dev.  =  B:   Z. .  B+A 

b+(j-l) 

E .    X 

Moving  Averages:   y  -   k=j k 

^J  ~  where  b  =  length  of  the  period 


II.   Restrictions 


A.   The  maximum  number  of  variables  is  ^450. 


B.   "Basic  Statistics"  may  be  calculated  using  as  many  as  30  control 

variables.   Data  must  be  presorted  (for  instance  with  SORT -MERGE  or 
on  a  card  sorting  machine)  on  the  control  variables. 
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C .  In  obtaining  moving  averages  ^here  nvar  =  number  of  variables  and 
b  =  length  of  period,  nvar*b  is  fixed  for  any  core  size.   If  a 
design  will  not  fit,  a  message  will  be  printed  giving  proper  incre- 
ment for  Region. 

D.  Moving  averages  are  exclusive  of  all  other  options. 

E.  Output  is  of  four  categories: 

1 .  With  or  Without  Control  Breaks 

a.  "Basic  Statistics" 

b.  Moving  averages:   "Basic  Statistics" 

2.  Without  Control  Breaks 

a.  Standard  scores  about  data  mean  and  standard  deviation. 
Printed  output  includes  "Basic  Statistics." 

b.  Standard  scores  about  a  given  mean  and  standard  deviation. 
Printed  output  includes  "Basic  Statistics." 

c.  Both  Sections  2a  and  2b  at  the  same  time  and  output  may  be 
printed  and/or  stored  on  two  different  storage  locations. 

d.  Control  breaks  may  not  be  used  with  Standard  Scores  option, 


III.   Parameters 


The  parameters  appear  on  the  program  call  card  following  the  program  name 
STANDARD  SCORES  (or  the  program  mnemonic  STA)  in  this  order: 


Parameter 
Number 


Description 

Input  Address.  SEQUENTIAL  1-15-  Cards  if  only 
"Basic  Statistics"  desired,  or  if  precalculated 
means  and  standard  deviations  are  supplied  (see 
parameter  10) . 


Output  Address  of  Standardized  Scores. 

Output  Address  for  "Basic  Statistics."   "Basic 

Statistics"  can  be  put  out  on  a  temporary  unit. 

Output  is  in  the  form  of  six  colijmn  vectors  (N, 

X ,  S . ,  V . ,  Skew . ,  Kurt . ) . 
J    J      J      J 

If  1,  use  N-1,  if  0,  use  N  for  denominator  of 
standard  deviations.   N-1  gives  an  unbiased  esti- 
mate of  the  population  standard  deviation  and 
population  variance.   N  gives  the  sample  standard 
deviation  and  sample  variance. 


Output  Address  for  Standard  Scores  about  a 
specified  Mean  and  Standard  Deviation.   SEQUENTIAL 
1-15  and/ or  PRINT. 


If  parameter  5  is  being  used,  place  desired  Mean 
between  asterisks,  for  example,  *50*. 


III.STA.3 


Parameter 
Number 


Description 

If  parameter  5  is  being  used,  place  desired 
Standard  Deviation  between  asterisks,  for 
example,  *5*. 

Moving  Averages:   Put  the  number  of  periods 
(observations)  over  which  it  is  desired  that  the 
data  be  averaged  (i.e.  b).   If  control  variables 
are  being  used  and/or  the  actual  number  of  obser- 
vations is  less  than  stated,  the  data  will  be 
averaged  using  the  actual  number  of  observations, 

If  set  equal  to  1,  "Basic  Statistics"  will  be 
corrected  for  missing  data.  Missing  data  must 
be  coded  as  -0.0  (blanks). 


10 


Input  Address  of  Means  and  Standard  Deviations. 
First  row  contains  means,  second  row  contains 
standard  deviations.   All  additional  rows  input 
are  ignored.   Valid  only  for  standard  scores. 


If  using  controls,  on  a  separate  card  immediately  after  the  STANDARD 
SCORES  card,  list  variable  num.bers  of  those  variables  used  as  controls. 
For  example,  if  controlling  on  variables  1,  2,  and  h: 

STA(S2)(P). 
$C-B(L)(2)(M. 

NOTE:   If  there  is  only  1  observation  and  parameter  h    is  set  to  1,  then  the 
"Basic  Statistics"  for  that  variable  will  be  set  to  zero.   If  blanks  are 
checked  and  standard  scores  are  requested,  those  observations  which  have  a 
blank  will  remain  blank  after  calculation  of  standard  scores. 
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IV.   Example  1 

Example  2 

(Use  of  Parameter  10) 

//  EXEC  SOUP                       ■ 
//SYSIN  DD  *                       1 

//   EXEC  SOUP 

MAT.                      1 

//SYSIN  DD  * 

M0VE(C)(S1/P).                       ■ 

MAT.                            1 

END  P                               1 

M0V(C)(S1). 

STA(S1)()(P).              2          1 

END  P 

STA(S1()(P)(1).            3          1 

C0R(S1)(S2).                    2 

STA(S1)(P)(P).             k                         1 

MAT.                            3 

STA(S1)()()()()****(5).     5 

TRA(S2)(S3). 

END  S 

END  P 

DATA(36)(36F2.0) 

STA(S1)(P)()()()****()()(S3).    h 

END  S 

Data  deck  is  placed  here 

DATA(10)(10F1.0) 

END# 
/* 

END# 

/* 

Explanation  of  Example  1 

Program  1  vill  move  data  from  cards  to  Sequential  1. 

Program  2  will  calculate  means,  standard  deviations,  and  sample  size,  and 

save  results  in  column  form  on  Sequential  2. 

Program  3  transposes  contents  of  Sequential  2  and  stores  them  in  row  form 

on  Sequential  3. 

Program  h   reads  means,  standard  deviations,  from  Sequential  3  (ignoring 

all  subsequent  rows)  and  calculates  and  prints  Standard  Scores. 

Explanation  of  Example  2 

Program  2  will  print  "Basic  Statistics"  for  variable,  based  on  sample 

standard  deviation.                     '^ 

Program  3  will  print  "Basic  Statistics"  for  variable,  based  on  unbiased 

estimate  of  population  standard  deviation. 

Program  h   will  print  standard  scores  matrix  in  addition  to  "Basic           H 

Statistics."                                               ^ 

Program  5  will  print  "Basic  Statistics"  over  every  set  of  5  periods          | 

(moving  averages). 
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Chapter  1.   General  Description 

^ 

1.1  Introduction 

BALMOVA  5  is  a  general  analysis  of  variance  program  applicable  to  a 
wide-range  of  balanced  designs.   In  the  case  of  designs  with  a  replication 
factor^  BALANOVA  5  allows  inequality  in  the  number  of  replications  in  each 
cell.   If  the  number  of  replications  is  equal  or  proportional,  the  analysis 
is  handled  by  least  squares  (weighted  means) .   If  the  number  of  replications 
is  not  proportional  then  an  unweighted  means  analysis  is  performed.   This 
is  an  approximation  to  the  least  squares  solution. 

BALANOVA  5  accepts  some  designs  that  are  not  completely  crossed,  namely 
those  nested  designs  in  which  all  main  factors  are  balanced.  Hence  hierarchical 
designs  are  allowed.  As  well,  repeated  measures  designs  are  allowed.   In  these 
designs  the  replication  factor  is  not  nested  in  all  the  other  factors. 

The  design  model  may  be  fixed-effects,  random- effects  or  mixed.   BALANOVA  5 
automatically  determines  all  the  legal  sources  of  variation  (main  effects  and 
interactions)  and  determines  the  correct  denominator  mean  square  for  those 
sources  which  can  be  tested  by  F  test.   In  order  to  do  this,  BALANOVA  5  first 
generates  the  expected  mean  square  table  which  is  printed  in  readable  form. 
The  method  used  closely  follows  Scheffe  (1959),  Chapter  8. 

BALANOVA  5  will  accept  most  of  the  designs  described  in  Winer  (1962), 
Chapters  3,  h,    5,  6,  and  7  and  Lindquist  (1953),  Chapters  3,    5,  6,  T,  8,  9, 
10,  and  13  (Types  I,  III,  VI).   Chapter  2  and  Appendix  A  of  this  write-up 
contain  a  large  number  of  examples  drawn  from  these  two  books. 

For  proper  use  of  BALANOVA  5,  the  following  general  warnings  should  be 
kept  in  mind : 

1.  A  general  program  such  as  BALANOVA  5  encourages  the  use 
of  statistics  in  a  "cook-book"  manner.  Data  is  generated 
to  fit  the  input  specifications  of  the  program  with  no 
consideration  given  to  the  theory  of  analysis  of  variance. 
The  experimenter  who  uses  a  computer  program  in  this  way 
often  neglects  to  consider  whether  the  statistical  test 

is  appropriate  for  the  work  he  is  interested  in  and  whether 
the  assumptions  needed  for  the  test  are  satisfied  in  the 
particular  experiment  he  has  used. 

2.  Results  printed  by  a  program  such  as  BALANOVA  are  often  accepted 
by  the  experimenter  as  being  infallible  when,  in  fact,  all 
calculations  on  a  digital  computer  are  subject  to  possible  pro- 
greunming  error,  round-off  error  or  simple  machine  error.   Always 
double  check  your  results  to  make  sure  they  "make  sense." 

3.  In  the  particular  case  of  analysis  of  variance,  the  idea 
has  become  widespread  that  the  summary  table  of  F  ratios 
is  the  most  important  part  of  the  analysis.  This  is  not 
the  case.   The  most  important  part  of  analysis  of  variance 
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is  the  estimation  of  the  main  effects  and  the 
interactions.   Only  by  looking  at  their  size  can 
the  experimenter  evaluate  what  is  happening  in 
his  experiment.   In  order  to  encourage  this  use 
of  analysis  of  variance,  BALANOVA  5  prints  a  table 
of  marginal  means  which  allows  easy  calculation  of 
all  the  effects  in  the  experiment.   The  F  table  is 
only  a  set  of  warning  signals.  A  non- significant 
F  indicates  that  the  corresponding  differences 
between  effects  can  be  attributed  to  sheer  chance. 

h.      BALANOVA  5  performs  an  unweighted-means  analysis  when 
the  replication  numbers  are  non-proportional.   The 
author  fears  that  this  option  will  be  used  too  often  and 
without  consideration  of  its  dangers.   The  unweighted- 
means  solution  is  often  not  satisfactory  ajid  references 
on  analysis  of  variajice  should  be  consulted.   ( Scheffe, 
1959,  Winer,  I962,  Lindquist, . 1953) • 

BALANOVA  5  was  designed  to  reduce  the  great  amount  of  hand  computation 
needed  in  analysis  of  variance  calculations.   It  was  not  intended  to  eliminate 
the  necessity  of  the  user  being  familiar  with  the  theory  of  analysis  of  vari- 
ance.  It  is  hoped  that  the  above  comments  will  discourage  some  indiscriminant 

use  of  BALANOVA  5- 

1.2  Special  Features 

The  output  from  BALANOVA  5  consists  of 

1.  A  table  of  the  expected  mean  squares  in  readable 
form. 

2.  The  number  of  replications  in  each  cell  in  the 
case  of  designs  with  a  replication  factor. 

3.  The  table  of  marginal  means.   All  means  entering 

in  the  computation  of  the  sum  of  squares  are  printed. 

k.      The  analysis  of  variance  summary  table  including, 
for  each  source  of  variation,  the  sum  of  squares 
and  mean  square,  and  for  each  source  with  denominator, 
the  F  ratio  and  the  probability  of  the  chance  occurance 
of  the  F  ratio. 

A  feature  of  BALANOVA  5  is  its  flexible  specification  of  analysis  of 
variance  designs,  allowing  a  wide  range  of  designs  to  be  described  by  a 
common  code . 


A  large  number  of  checks  are  made  by  BALANOVA  5  to  ensure  that  the  design 
is  legal  and  that  the  data  correspond  to  the  design.   Diagnostics  are  printed 
to  indicate  all  error  conditions. 
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1.3  Legal  Design  of  BALMOVA  ^ 

Consider  the  following  definitions,  taken  from  Scheffe  (1959) •   Let 
there  be  p  factors  in  a  design,  not  counting  the  replication  factor,  if 
there  is  one.   A  cell  is  specified  by  a  set  of  p  levels,  one  for  each 
factor.   The  layout  of  design  is  complete  if  there  is  at  least  one  ob- 
servation in  every  cell.   The  factors  in  such  a  design  are  completely 
crossed.   If  the  design  is  complete  and  there  is  a  replication  factor 
(i.e.  all  cells  have  at  least  one  observation  and  at  least  one  cell  has 
more  than  one  observation "^  then  the  design  is  considered  to  be  a  Class  A 
design  in  BALANOVA  5- 

There  are  many  analysis  of  variance  designs  which  are  not  complete 
in  the  above  sense.   Examples  of  incomplete  designs  are  Latin-square, 
incomplete  blocks  and  nested  designs.   The  only  incomplete  designs 
which  are  allowed  in  BALANOVA  5  are  nested  designs  which  are  balanced  in 
all  factors  except  for  the  replication  factor  (which  need  not  be  balanced) . 
These  incomplete  designs  are  called  Class  B  and  C  designs.   "Nesting"  ^ 
"balanced"  and  "replication  factor"  are  defined  in  the  next  three  para- 
graphs.  These  definitions  are  illustrated  in  Chapter  2. 

Nesting  may  be  defined  as  follows:  The  levels  of  a  factor  C  are 
nested  within  the  levels  of  a  factor  A  (in  short,  C  is  nested  within  A) 
if  and  only  if  each  level  of  C  appears  with  only  a  single  level  of  A  in 
the  observations.  Note  that  if  C  is  not  nested  within  A,  it  is  crossed 
with  A,  but  only  if  every  level  of  C  appears  with  every  level  of  A  is 
C  completely  crossed  with  A.  Latin-square  and  incomplete  block  designs 
are  only  partly  crossed. 

A  nested  factor  C  is  balanced  if  the  number  of  levels  of  C  is  the  same 
within  each  combination  of  those  factors  within  which  C  is  nested  and  the 
factors  (if  any)  which  are  crossed  with  C  are  completely  crossed. 

A  replication  factor,  in  BALANOVA  5?  is  a  factor  which  is  nested  within 
one  or  more  other  factors,  but  not  necessarily  within  all  other  factors. 
Furthermore,  no  factor  may  be  nested  within  the  replication  factor.   That  isj 
a  factor  is  a  replication  factor  if  and  only  if  for  every  other  factor  A  in 
the  design,  it  is  either  nested  within  A  or  crossed  with  A.   A  replication 
factor  may  be  nested  within  some  factors  and  crossed  with  others.   There  can 
be  at  most  one  replication  factor  in  a  design. 

The  distinction  is  made  between  replication  factors  and  other  nested 
factors  in  BALANOVA  5  since  replication  factors  do  not  have  to  be  balanced. 
All  other  factors  must  be  balanced. 


Using  these  definitions,  the  following  designs  are  legal  in  BALANOVA  ^. 

Class  A  designs  (completely  crossed  with  nested  replications) 

Class  A  designs  contain  (p  +  l)  factors  of  which  p  are  the  main  factors 
and  the  other  factor  is  the  replication  factor.   The  following  two  conditions 
must  both  be  met  for  the  design  to  be  Class  A. 
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(a)  All  p  main  factors  are  completely  crossed. 

(b)  The  replication  factor  is  nested  in  all  main  factors. 

Thus  one-way  and  factorial  designs  are  Class  A  designs. 

Class  B  designs  (other  replication  designs) 

Class  B  designs  also  contain  (p  +  1)  factors  of  which  p  are  the  main 
factors  and  the  other  factor  is  the  replication  factor.  However  one  or 
both  of  the  two  conditions,  (a)  and  (b)^  are  not  satisfied  in  Class  B 
designs. 

When  (a)  is  not  satisfied,  that  is,  the  p  main  factors  are  not 
completely  crossed,  then  the  main  factors  must  satisfy  the  following 
condition. 

(a')   Consider  any  two  main  factors,  A  and  B.   Either  A  is  completely 
crossed  with  B,  or  A  is  nested  within  B  or  B  is  nested  within 
A.   This  must  be  true  for  all  pairs  of  main  factors.   Further- 
more, at  least  one  pair  must  have  the  nested  relationship  or 
else  (a"^  would  be  satisfied. 

When  (b)  is  not  satisfied,  then  the  following  condition  must  be  true. 

(b')   The  replication  factor  is  nested  in  at  least  one  but  not  all 
main  factors.   Note  that  the  requirement  that  the  replication 
factor  be  nested  in  at  least  one  factor  is  part  of  the  basic 
definition  of  a  replication  factor. 

Class  B  designs  then  can  be  of  the  following  two  types. 

Hierarchical  designs:  (a')  and  (b)  are  satisfied.  The  replication 
factor  is  nested  in  all  factors  but  there  is  some  nesting  among  the  main 
factors. 

Repeated  measures  designs:   (b')  is  satisfied.   Either  (a)  or  (a')  can 
be  satisfied.   The  necessary  feature  (b')  of  repeated  measures  designs  is 
that  the  replication  factor  is  crossed  with  one  or  more  of  the  main  factors. 
'I'he  factors  in  which  the  replication  factor  is  nested  may  themselves  be  eithe: 
crossed  (a)  or  nested  (a'). 

Class  C  designs  (no  replication  factor') 

Class  C  designs  have  p  factors  and  there  is  no  replication  factor. 
All  factors  must  be  balanced.   For  each  pair  of  factors,  e.g.  factors  A 
und  B,  either  A  is  completely  crossed  with  B,  or  A  is  nested  within  B  or 
B  is  nested  within  A.   There  does  not  necessarily  have  to  be  any  nesting 
at  all. 
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In  sujranary,  then,  designs  are  classed  in  the  following  way  in  BALMOVA 
5.   Class  A  and  B  designs  have  a  replication  factor.  Class  C  designs  do  not. 
Class  A  designs  are  distinquished  from  Class  B  designs  in  that  a  Class  A 
must  have  l)   all  main  factors  completely  crossed  and  2)   the  replication 
factor  nested  in  all  main  factors.   Class  B  designs  violate  one  or  both 
these  requirements. 

In  Class  A  and  B  designs,  the  replication  factor  does  not  need  to  be 
balanced.  However  all  nested  factors,  except  the  replication  factor  (if 
any),  must  be  balanced.   Recall  that  in  Class  A  designs,  the  replication 
factor  is  the  only  nested  factor. 

As  explained  above,  the  replication  factor  is  distinquished  from 
other  nested  factors  since  it  does  not  have  to  be  balanced.   There  are 
two  other  reasons  for  distinguishing  the  replication  factor  from  other 
nested  factors.   These  reasons  are  important  even  if  the  replication 
factor  is  balanced. 

1.  In  Class  A  designs  (completely  crossed  with  replications)  only 
cell  means  are  stored  in  the  computer  and  thus  very  large  designs 
can  be  accommodated.   The  allowable  number  of  replications  in 
each  cell  is  virtually  unlimited. 

2.  For  all  replication  designs,  whether  of  Class  A  or  B,  the  level 
number  for  the  replication  factor  in  each  nest  does  not  need 

to  run  from  one  up  to  the  maximum  number  of  levels  in  each 
nest  as  it  does  for  all  other  factors.  Any  convenient  numbering 
of  the  replications  may  be  used  (e.g.  a  unique  number  for  every 
subject  in  the  experiment,  regardless  of  the  nest  within  which 
he  is) .   This  feature  of  BALANOVA  5  is  especially  useful  when 
several  dependent  variables  are  analyzed  and  there  is  missing 
data  for  some  of  the  subjects  for  some  of  the  dependent  variables. 

l.k     Calculations  for  equal  and  unequal  number  of  replications 


The  calculations  performed  by  BALANOVA  5  for  designs  with  a  replication 
factor  (Class  A  and  B  designs)  depend  on  whether  the  number  of  replications 
in  each  cell  are  equal  or  unequal.   If  the  numbers  are  equal,  the  standard 
analysis  of  variance  calculation  is  made  (least-squares  or  weighted  means 
analysis).   If  the  numbers  are  unequal,  a  check  is  first  made  to  see  if  the 
cell  N's  are  proportional.   In  a  two-way  analysis  of  variance,  for  example, 
the  cell  N's  are  proportional  if  the  number  of  replications  in  the  ij  cell. 


N 


IJ- 


satisfies 


%j  = 


"it  "^  "t.i 


N 


TT 


where  the  T's  indicate  marginal  totals.   If  the  cell  N's  are  proportional, 
BALANOVA  5  makes  the  least- squares  calculations,  i.e.  weighted  means  are 
used.   If  the  cell  N's  are  not  proportional,  then  the  method  of  unweighted 
means  is  used  (See  Scheffe,  pp.  262-3  or  Winer,  pp.  222-^). 
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In  general,  if  i,  j,  k,...,  A  are  those  factors  within  which  the 
replication  factor  is  nested  (not  necessarily  all  the  factors  in  the  design), 
and  if  Nijk...X  ^^  "^^^  number  of  replications  in  a  particular  nested  cell, 
then  the  cell   N's  are  proportional,  if,  for  all  combinations  ijk...A, 


ijk. .  .A 


TJ         X  N         X 

iTT.. .T    TJT...T 


(N       ) 


q-1 


X  N, 


TTT...X 


In  this  formula,  the  T's  indicate  marginal  totals  and  Q  is  the  number  of 
factors  within  which  the  replication  factor  is  nested.   In  particular,  the 
one-way  analysis  with  unequal  N's  is  a  proportional  design  (i.e.  the  cell 
N's  are  proportional)  by  this  definition,  since 


N.  = 

1 


N. 

1 


(Nt) 


—  =  "i 


In  fact,  any  design  in  which  the  replication  factor  is  nested  in  only  one 
factor  is  a  proportional  design. 

1.5  Parameters 

Each  observation  (row  of  data)  input  to  this  program  must  be  identi- 
fied by  a  number  for  each  factor  including  the  replication  factor.   These 
numbers  (which  cannot  be  read  in  I  format)  represent  the  levels  of  the 
corresponding  factors  and  must  precede  the  dependent  variables.   In  the 
output  produced  by  the  program,  each  factor  is  given  a  unique  letter  name, 
beginning  with  A.   Thus  the  first  col\amn  of  the  input  data  corresponds  to 
the  levels  of  factor  A  which  is  described  on  the  first  factor  specification 
card  (see  below).   Each  additional  factor  is  given  the  next  letter  in  the 
alphabet,  and  a  corresponding  factor  specification  card.  The  dependent 
variables  follow  the  factor  levels  on  the  input  data,  and  they  are  numbered 
one  through  the  total  number  of  dependent  variables,  in  the  output  of  the 
program. 

On  the  program  call  card,  the  following  parameters  follow  the  program 
name,  BALANOVA  5;  with  the  first  five  psirameters  being  required. 


Parameter 
Number 


1 
2 


Description 

Input  Address.   CARDS  or  SEQUENTIAL  1-15 . 

Number  of  factors  counting  replication  factor 
if  there  is  one.   Maximum  =  10. 


3 

h 

5-13 


Number  of  dependent  variables. 

st 
Number  of  levels  of  the  1   factor. 

Number  of  levels  of  the  2  -10   factors. 


Parameter 
Nxomber 


Ik 


15 
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Description 

1  if  desire  unweighted  means  analysis  even 
though  have  proportional  cell  frequencies. 

1  to  suppress  printing  of  all  means. 


Following  the  program  card  is  a  separate  suhparameter  card  (factor  speci- 
fication card)  for  each  factor  in  the  order  in  which  the  factors  appear  in 
the  input  data.  Each  card  has  the  following  parameters. 


Parameter 
Number 


Description 

0  if  fixed  factor 

1  if  random  factor 


3-11 


0  if  not  the  replication  factor 

1  if  is. the  replication  factor 

Factors  in  which  this  factor  is  nested 


As  in  other  SOUPAC  programs,  parameters  at  the  end  of  the  card  which  are 
not  used  may  be  deleted  and  the  period  appear  after  the  last  non-zero  parameter, 
The  factor  specification  cards  must  be  followed  by  an  END  PROGRAM  card. 

1^6^  Specification  of  a  Design 

Any  design  is  described  by  listing  the  following  information  about  each 
factor  in  the  design,  including  the  replication  factor  if  there  is  one.   The 
information  for  each  factor  is  punched  on  a  separate  card  (a  factor  specifi- 
cation card),  and  the  cards  should  be  in  the  same  order  as  the  factors  are 
in  the  input  data.   Each  parameter  should  be  enclosed  in  parentheses,  and  each 
card  terminated  by  a  period. 


Parameter  1 


Parameter  2 


Parameter  3-11 


Type  of  factor.   The  first  parameter  on  each 
factor  specification  card  should  be  a  zero 
if  the  factor  is  fixed,  and  a  one  if  it  is 
random.   The  replication  factor  is  always  a 
random  factor .   At  least  one  factor  in  every 
design  must  be  random. 

Replication  Factor.   If  the  design  has  a 
replication  factor,  this  is  indicated  by 
punching  a  one  for  the  second  parameter.   A 
design  may  have  only  one  replication  factor. 
If  there  is  no  replication  factor,  the  second 
parameter  should  be  zero  (or  blank)  on  all 
of  the  factor  specification  cards. 

Nesting.   The  factors  in  which  the  given  factor 
(the  one  to  which  this  card  refers)  is  nested 
are  listed.  Factors  are  numbered  from  one 
through  the  niimber  of  factors  in  the  design. 
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If  the  factor  is  not  nested,  parameters  3-11 
may  te  completely  omitted. 

An  example  of  this  way  of  specifying  a  design  will  be  now  given.   Con- 
sider a  two-way  analysis  of  variance  with  subjects  within  cells.   The 
design  is  considered  to  have  three  (not  two)  factors,  namely  A  and  B,  the 
main  factors,  and  C,  the  replication  factor.   Suppose  there  are  3  levels 
of  A  and  h   levels  of  B  and  that  each  cell  has  10  subjects.   The  cards  used 
to  perform  this  analysis  are  listed  below.  Each  line  corresponds  to  one 
IBM  card. 

BALAWOVA(CAEDS ) ( 3 ) (1 ) ( 3 ) ( U ) (10 ) . 

(0)(0). 

(0)(0). 

(1)(1)(1)(2). 

EKD  PROGRAM 

The  first  card  listed  above  calls  the  BALANOVA  program.  The  first 
parameter  is  the  location  of  the  data  (CARDS),  the  second  is  the  n\jmber 
of  factors  (3),  the  third  is  the  number  of  dependent  variables  (l),  the 
fourth  is  the  number  of  levels  of  the  first  factor  (3),  followed  by  the 
number  of  levels  of  the  second  factor  (U),  and  finally  the  number  of 
levels  of  the  last  factor,  which  is  the  maximum  cell  size  (lO)  when  con- 
sidering the  replication  factor. 

The  second  card  is  the  factor  specification  card  for  factor  1.  The 
first  2  parameters  are  zero,  labeling  this  factor  as  fixed,  and  as  not 
being  the  replication  factor.  The  third  card  is  the  factor  specification 
card  for  factor  2  which  is  also  fixed,  and  not  the  replication  factor. 
Note  that  parameters  3-11  are  blank,  as  factors  1  and  2  are  not  nested  in 
any  other  factors. 

The  fourth  card  is  the  factor  specification  card  for  factor  3.   Its 
four  parameters  denote  it  as  a  random  factor,  as  the  replication  factor, 
and  as  nested  in  factors  1  and  2.   The  fifth  card  terminates  the  BALANOVA 
program. 


I 
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Chapter  2.  Design  Examples 

2.1  Class  A  Designs:   (Completely  crossed  with  nested  replications) 

Single-factor  designs  (Winer,  Chapter  3;  Lindquist,  Chapter  3) 

The  single-factor  design  is  the  simplest  analysis  of  variance  design. 
It  is  often  called  the  one-way  analysis  of  variance  or  the  simple-randomized 
groups  design.   See  the  Winer  and  Lindquist  references  and  also  Hays, 
Chapters  12  and  13,  for  a  detailed  discussion  of  this  design. 

The  program  cards  for  a  design  with  5  groups  of  20  subjects  each  is 
shown  below.   The  groups  are  considered  to  be  levels  of  factor  1  and  the 
subjects  are  factor  2.  The  order  of  the  factors  is  the  same  on  the  main  param- 
eter card,  on  the  factor  cards,  and  punched  on  the  input  cards. 

BALM0VA5  (C  )  (2)  (l)  (  5)  (20) . 

(0). 

(1)(1)(1). 

END  P 

Note  that  in  this  design  no  subject  appears  in  more  than  one  group. 
Hence  the  subject  factor  (factor  2)  is  nested  within  the  group  factor  (factor 
l).   Note  also  that  the  replication  factor  is  listed  as  a  random  factor  while 
the  group  factor  is  fixed.   It  would  be  possible  to  consider  the  group  factor 
as  a  random  factor.   See,  e.g.,  Winer,  pp.  56-63.   However,  in  the  single- 
factor  design  the  calculations  are  unchanged  regardless  of  what  type  the  main 
factor  is.   So  the  choice  of  fixed  or  random  for  the  type  of  the  main  factor 
is  immaterial  in  the  input  to  Balanova  5-   The  interpretation  of  the  results, 
however,  depends  on  the  type  of  factor  assvimed. 

If  the  groups  are  of  unequal  size,  the  program  does  not  have  to  be  alter- 
ed except  that  the  number  of  levels  of  the  replication  factor  must  be  greater 
than  or  equal  to  the  number  of  replications  in  the  largest  groups.   For  ex- 
ample, if  the  groups  have  sizes  10,  12,  15,  9?  20,  then  the  above  program  is 
still  correct. 

Factorial  designs  (Winer,  Chapter  5  and  6;  Lindquist ,  Chapters  5s  8?  9j  10 ) 

In  these  designs,  subjects  are  assigned  to  groups.   Each  group  is  iden- 
tified by  a  set  of  levels,  one  level  for  each  main  factor  in  the  design.   The 
levels  of  a  factor  may  represent  different  experimental  treatments  or  a 
classification  of  a  continuum  into  discrete  levels. 


Three  examples  from  Winer  and  Lindquist  are  given  below.   Further  ex- 
amples of  Class  A  designs  may  be  found  in  Appendix  A. 

Winer,  pp.  233-238 

This  is  a  2  X  3  factorial  design  with  three  observations  per  cell.  The 
program  cards  are : 
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MLAN0VA5  (C)(3)(l)(2)(3)(3). 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 

Note  that  the  subject  factor  (factor  3)  is  nested  in  both  factor  1  and 
factor  2,  since  any  subject  is  only  treated  by  one  level  of  factor  1  and  one 
level  of  factor  2.   The  replication  factor  is  random  while  the  main  factors 
are  fixed.   Either  or  both  of  the  main  factors  could  be  random.   The  F  tests 
would  vary  depending  on  this  choice.  Balanova  5  makes  the  correct  tests  de- 
pending on  the  factor  type  indications.   See  Winer,  pp.  170-17^. 

Winer,  pp.  2l;l-2^U 

This  is  a  2  X  U  factorial  design  with  unequal  cell  frequencies.  The 
factors  are  specified  in  the  same  way  as  for  an  equal  cell  frequency  design 
except  that  the  niunber  of  levels  of  the  replication  factor  is  set  greater 
than  or  equal  to  the  maxim-um  cell  frequency.  The  calculations  are  carried 
out  using  the  method  of  unweighted  means  unless  the  cell  frequencies  are  propor- 
tional, in  which  case  the  least  squares  analysis  of  variance  calculations  are 
carried  out. 

BALAN0VA5  (C ) (3 ) (l ) (2) (U ) ( 5) . 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 

As  mentioned  in  Winer  (p.  2^3,  below  Table  6.3-3)  both  main  factors  are 
assumed  to  be  fixed. 

Lindquist,  pp.  226-228 

This  is  a  completely  crossed  design  with  proportional  cell  frequencies. 
The  program  cards  are: 

BALAN0VA5  (C) (U) (l ) (2) (U ) (3) (8) . 

(0). 

(0). 

(0). 

(1)(1)(1)(2)(3). 

END  P 


No  special  indication  is  needed  that  the  design  is  proportional. 
nova  5  will  discover  that,  for  all  ijk. 


Bala- 


N.  ., 


N   N   N 
iTT  TjT  TTk 


(N, 


TTT) 


where  '^±^-^   is  the  niomber  of  subjects  in  cell  ijk  and  a  T  in  a  subscript  indi- 


cates a  marginal  total. 


at  the  bottom  of  page  226)  has  N.    =  6. 

ijk 


For  example,  cell  122  (the  5th  column  of  the  chart 

Now 
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N=I+  +  l4  +  U  +  6  +  6  +  6  +  8  +  8  +  8  +  5  +  5  +  5 
N    =6  +  6-^6  +  6  +  6  +  6  =  36 


=  69 


N, 


TT2 


U  +  6  +  8  +  5  +  i+  +  6  +  8  +  5  =  ii6 


N    =  total  number  of  replications  =  138 


N   N   N 

'•1  mnr^'mpm  'rprpp 


(N, 


TTT) 


69  X  36  X  k6 
(138)2 


ijk 


A  similar  equality  will  be  found  for  all  ijk.  Hence  the  design  is  propor- 
tional . 

2.2  Class  B  designs  (other  replication  designs) 

All  Class  B  designs  have  a  replication  factor  but  at  least  one  of  the 
two  conditions  for  a  Class  A  design  is  not  satisfied.   In  repeated  measures 
(below) ,  the  replication  factor  is  not  nested  in  all  the  other  factors  in  the 
design.   In  hierarchical  designs  (below),  the  non-replication  factors  are 
nested  rather  than  crossed. 

Repeated  measures  designs  (Winer,  Chapter  7;  Lindquist,  Chapter  13) 

In  repeated  measures  designs,  the  replication  factor  is  not  nested  within 
all  the  other  factors  as  it  is  in  Class  A  designs.   The  replication  factor  is 
crossed  with  one  or  more  factors  in  the  design. 

Four  examples  from  Winer  are  given  below.   Lindquist 's  Type  I,  IV  and  VI 
are  similar  to  these  designs  and  are  listed  in  Appendix  A.   The  designs  dis- 
cussed in  Winer,  Chapter  k,    (Single-factor  experiments  having  repeated  mea- 
sures) are  not  considered  to  be  Class  B  designs  since  the  subject  factor  is 
not  nested  in  any  factor  and  hence  cannot  be  considered  to  be  a  replication 
factor  in  Balanova  5.   Such  designs  are  considered  to  be  Class  C  designs. 

All  repeated  measures  designs  require  additional  assumptions  to  ensure 
the  validity  of  the  tests.   It  is  suffested  that  Scheffe  and  Winer  be  con- 
sulted about  these  assumptions. 

Winer,  pp.  302-318 

This  design  has  two  main  factors  1  and  2  with  subjects  nested  within 
factor  1.   Factor  1  is  the  group  factor.   The  program  cards  for  the  data  in 
Table  7-2-3  are: 

BALAN0VA5  (C  )  (3)  (l )  (2)  (i|)  (3) . 

(0). 

(0). 

(1)(1)(1). 

ENDP 

Note  that  factor  3  (subjects)  is  a  random  factor  and  that  it  is  nested  only  in 
factor  1.  Also  note  that  although  there  are  six  subjects  altogether,  there 
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I 


are  only  3  vithin  each  nest,  namely  vithin  each  level  of  factor  1  and  hence 
the  number  of  levels  for  factor  3  ( subjects)  is  3.   As  will  be  pointed  out  in 
more  detail  in  Chapter  3,  the  subjects  may  retain  their  original  numbering 
from  1  to  6  rather  than  being  numbered  from  1  to  3  in  each  nest  as  would  be 
the  case  if  factor  3  (subjects)  were  not  a  replication  factor. 

As  discussed  in  Winer,  p.  3l8,  either  or  both  of  factors  1  and  2  may  be 
random  rather  than  fixed,  in  which  case  Balanova  5  will  change  the  tests 
appropriately  and  print  out  the  appropriate  expected  mean  square  table. 


A 


Winer,  pp.  319-337 

This  is  a  repeated  measures  design  with  repeated  measures  on  two  com- 
pletely crossed  factors.   The  program  cards  for  the  data  in  Table  T.^-3  are; 

BALAN0VA5  (C  )  (1+ )  (l )  (2)  (3  )  (3  )  (3) . 

(0). 

(0). 

(0). 

(1)(1)(1). 

END  P 

Note  that  factor  h    (subjects)  is  only  nested  within  factor  1  while  factors  1, 
2  and  3  are  completely  crossed  and  factor  h    (subjects)  is  crossed  with  factors 

2  and  3.  Factor  h    (subjects),  again,  is  a  random  factor. 

On  page  335,  Winer  discussed  the  tests  when  some  of  the  factors  1,  2  or 

3  are  random.   Balanova  5  follows  these  rules  automatically. 

Winer,  pp.  337-3^9 

The  designs  in  this  section  also  have  three  main  factors  but  the  repli- 
cation factor  is  nested  in  two  of  them.   The  program  cards  are: 

BALAN0VA5  (C)  (U)  (l )  (2) (2) (U ) (3) . 

(0). 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 

Again  the  general  form  of  the  expected  mean  squares  in  the  case  of  some  of  the 
random  factors  is  given  on  pp.  3^7-3^8.  Balanova  5  prints  out  the  expected  me 
square  table  appropriate  to  the  design  chosen  and  makes  the  correct  tests. 

Winer,  pp.  37i;-378 

The  case  of  unequal  group  size  is  also  handled  by  Balanova  5-   If  the 
number  of  replications  within  each  nest  is  proportional,  then  the  exact  anal- 
ysis of  variance  is  performed.   Winer  calls  this  the  least-squares  solution  in 
Table  7-8-6.   Note  that  if  the  replication  factor  is  nested  in  only  one  factor 
then  it  is  proportional  and  the  least-squares  solution  will  be  performed. 


If  the  number  of  replications  is  not  proportional,  then  an  unweighted- 
means  analysis  is  performed. 
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If  it  is  desired  in  the  proportional  case  to  perform  the  unweighted- 
means  analysis  anyway,  an  override  is  available.   See  main  program  card, 
parameter  lU  in  Section  1.5. 

The  factor  specification  table  for  the  unequal  number  of  replications 
case  is  identical  to  the  equal  case  except  that  the  entry  under  Number  of 
Levels  must  be  the  maximum  number  of  levels  of  the  replication  factor  in  any 
one  nest.  For  example,  the  data  in  Table  7-8-3  would  have  the  program  cards: 

BALAN0VA5  (C  )  (3) (l) (2)  (3) (5) . 

(0). 

(0). 

(1)(1)(1). 

END  P 

For  these  data,  the  design  is  proportional,  and  Balanova  5  would  normally 
perform  the  least  squares  solution.   The  unweighted  solution  could  be  per- 
formed instead  if  an  override  is  given.   Both  solutions  (see  Tables  7-8-5  and 
7.8-6)  have  been  checked  with  Balanova  5- 

Hierarchical  designs   (Winer,  Chapter  5;  Lindquist,  Chapter  7) 

In  hierarchical  designs,  the  replication  factor  is  nested  in  all  the 
main  factors,  but  all  the  main  factors  are  not  crossed.   Some  of  the  main 
factors  are  nested.   Two  designs  from  Winer  are  illustrated  below.   Further 
examples  from  Winer  and  Lindquist  are  given  in  Appendix  A. 

Winer,  pp.  I8I1-I87 

The  hospitals  v/-ithin  drugs  example  on  p.  iS^t  has  the  following  program 
cards  if  there  are  n  =  20  patients  in  each  hospital. 

BALAN0VA5  (C) (3) (l) (2) (3) (20) . 

(0). 

(0)(0)(1). 

(1)(1)(1)(2). 

END  P 

Hospitals  (factor  2)  are  nested  within  drugs  (factor  l)  since  'each  hospital 
appears  with  only  one  drug.   The  number  of  levels  for  hospitals  is  three 
rather  than  six  since  there  are  only  three  hospitals  in  each  level  of  drug. 
Patients  are  nested  in  hospitals  and  drugs  and  patients  (factor  3)  is  the 
replication  factor  since  no  factor  is  nested  within  patients. 


Note  that  the  factorial  design  (Class  A)  given  on  the  bottom  of  p, 
has  program  cards: 

BALAN0VA5  (C ) (3) (l ) (2) (6) (lO) . 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 


185 


Hospitals  (factor  2)  is  no  longer  nested  in  drugs  (factor  l)  and  the  number  of 
levels  of  hospitals  (factor  3)  is  now  6  rather  than  3-  Furthermore,  there  are 
only  10  patients  in  each  cell. 
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The  partially  hierarchical  design  on  p.  l86  (Table  5.12-1)  has  factor 
specification  table  (for  n  =  20): 

BALAN0VA5  (C)(U)(l)(2)(3)(2)(20). 

(0). 

(0)(0)(1). 

(0). 

(1)(1)(1)(2)(3). 

END  P 

Factors  1,  2  and  3  are  called  factors  A,  B  and  C  in  Winer. 

2.3  Class  C  designs  (no  replication  factor) 

Class  C  designs  have  no  replication  factor  and  hence  must  be  balanced. 
The  factors  may  be  crossed  or  nested.  Several  examples  are  given  belov  and 
more  are  given  in  Appendix  A. 

Winer,  pp.  111-116 

The  designs  in  Winer,  Chapter  U,  are  repeated  measures  designs  but  the 
subject  factor  is  not  nested.  Therefore  the  subject  factor  cannot  be  con- 
sidered a  replication  factor  and  the  design  cannot  be  Class  B.   The  design 
on  pp.  111-116  has  the  following  program  cards: 

BALAN0VA5  (C  )  (2) (l ) ( U ) ( 5) . 

(0). 

(1). 

END  P 

Note  that  the  person  factor  2  is  of  random  type.  At  least  one  factor  in  any 
design  must  be  random  for  there  to  be  a  denominator  term  for  an  F  ratio. 
Factors  1  and  2  are  completely  crossed;  therefore,  no  nesting  is  indicated. 
Also  there  is  no  replication  factor,  so  this  parameter  is  left  blank.  Bala- 
nova  5  will  make  the  correct  test  in  this  design,  namely  testing  the  drug 
factor,  factor  1,  against  the  interaction  mean  square. 

The  rationale  for  the  test  is  in  Winer,  pp.  116-12U.   The  test  of  homo- 
geneity of  covariance  is  not  made  by  Balanova  5  just  as  no  test  of  homogen- 
eity of  variance  is  made  for  any  design  input  to  Balanova  5- 


Winer,  p.  289 

The  Aborn  et  al  design  has  no  replication  factor  and  hence  is  a  Class  C 
design.   The  analysis,  after  the  transformation  of  the  data  and  estimation 
of  missing  data,  involves  pooling  the  interactions  into  one  mean  square  and 
using  the  pooled  estimate  as  the  denominator  term  to  test  the  three  main 
effects.   This  procedure  cannot  be  done  automatically  in  Balanova  5-   The 
following  steps  are  suggested.  Use  the  following  program: 

BALAN0VA5  (C ) (3 ) (l) (6) (4 ) (3 ) . 

(0). 
(0). 

(1). 

END  P 
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JJote  that  one  of  the  factors  is  stated  to  be  random.  This  is  done  solely 
in  order  to  allow  Balanova  5  to  function  since  the  program  requires  that 
there  he  a  legal  denominator  term.   If  all  factors  were  fixed,  there  would 
be  no  denominator.   The  summary  table  printed  by  Balanova  5  will  be  used 
only  for  the  sum  of  squares,  not  for  the  F  ratios,  which  will  be  incorrect 
since  they  are  based  on  a  model  having  a  random  factor  and  in  the  real 
design  no  factor  is  random.  Pool  the  interaction  sum  of  squares,  form  the 
mean  squares  and  carry  out  the  correct  F  tests  by  hand. 

The  preliminary  square  root  transformation  of  the  data  and  estimation  of 
missing  data  must  be  done  by  a  Transformations  program. 

Nested  designs  of  Class  C 

The  impression  may  have  been  given  that  no  nest  factors  are  allowed  in 

Class  C  designs.  This  is  not  the  case.   Class  C  designs  have  no  replication 

factor  (which  would  be  a  nested  factor)  but  other  factors  can  be  nested  be- 
sides replication  factors.   Consider  the  following  design: 

BALM0VA5  (C)  (3)  (l)  (3)  (3)  (lO) . 

(0). 

(0)(0)(1). 

(1). 

EM)  P 

A  concrete  realization  is  perhaps  hard  to  give  but  this  design  means  simply 
that  factor  2  is  nested  in  factor  1  and  both  factors  are  crossed  with  factor  3 
(subjects) . 


^^laBSsissiSb: 
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Chapter  3'   Preparation  of  Input 

3-1  Introduction 

The  following  rules  apply  to  the  assignment  of  factor  levels  in  all 
types  of  designs.   There  is  usually  no  need  to  rekeypunch  existing  data 
however,  as  it  is  almost  always  possible  to  create  the  factor  levels  in 
the  TRANSFORMATIONS  program.   If  you  need  help  using  this  program,  see  a 
SOUPAC  consultant. 

(a)      Non-replication  Factors  in  Class  A,  B  and  C  designs 

The  levels  for  non-replication  factors  must  run  from  one  (l)  con- 
secutively up  to  the  number  of  levels  given  on  the  BALANOVA  call  card. 
E.g.,  if  a  factor  represents  four  treatment  groups,  these  groups  must  be 
numbered  1,  2,    3>  and  h   and  each  subject's  row  or  rows  in  the  data  matrix 
must  have  a  1,  2,  3,    or  h   punched  to  indicate  the  group  he  is  in.   If  the 
factor  is  nested,  the  level  numbers  must  run  from  one  (l)  up,  in  each 
cell  of  the  nest.   See  the  example  given  in  Section  3-2. 

jh)      Replication  Factors  in  Class  A  design 

The  replication  numbers  (level  numbers ">  can  be  anything,  for  example, 
a  subject  identification  number.   The  subject  numbers  do  not  have  to  be 
unique  either  in  a  group  or  between  groups.   In  fact,  to  tell  the  truth, 
in  Class  A  designs,  the  replication  level  is  not  used  but  it  must  never- 
theless appear,  even  if  it  is  a  dummy.   This  statement  does  not  apply  to 
other  design  classes. 

(c)   Replication  Factors  in  Class  B  designs,  of  repeated  measures  type 

Special  care  must  be  taken  with  the  replication  levels  in  those  designs 
Let  us  divide  the  non-replication  factors  into  two  groups: 

a-set:   those  factors  in  which  the  replication  factor  is  nested. 


3-set:   all  other  factors  - 
replication  factor. 


i.e.  those  factors  crossed  with  the 


If  the  p-set  is  empty,  the  design  is  of  hierarchical  rather  than  repeated 
measures  type.   See  paragraph  (d)  below. 

Let  us  denote  by  an  a-cell  a  particular  set  of  levels  of  the  factors  in 
the  a-set.   The  replications  in  this  cell  may  be  any  values  (not  necessarily 
from  one  (1)  up)  but  must  be  distinct.   Again,  the  numberings  in  two  differe 
a- cells  do  not  have  to  be  distinct,  but  can  be.   In  other  words,  if  the 
replication  factor  is  subjects,  an  identification  number  may  be  used  as  the 
replication  level.   Now  each  subject  appears  in  more  than  one  row  (card)  of 
the  data  matrix  since  each  subject  appears  with  every  combination  of  levels 
of  the  factors  in  the  3-sst.   Now  it  should  be  obvious  that  every  row  that 
refers  to  the  same  subject  has  to  have  the  same  replication  number.   This  is 
the  only  way  that  BALANOVA  5  can  tell  that  two  different  rows  refer  to  the 
same  subject. 
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(d'^   Replication  Factors  in  Class  B  designs,  of  hierarchical  type 

The  subject  niimbers  must  all  be  different  within  any  one  cell  (one 
level  set  of  the  non-replication  factors)  but  may  be  the  same  over 
different  cells. 

Special  note  on  missing  data 

If  a  dependent  variable" field  on  a  card  is  totally  blank,  BALANOVA  5 
does  not  include  the  score  in  the  analysis  for  the  given  dependent  variable. 
However  other  non-blank  dependent  variable  fields  on  the  same  card  will  be 
included  in  their  respective  analyses. 

Do  not  confuse  this  deletion  of  missing  data  with  an  error  comment  by 
BALANOVA  5  to  the  effect  that  there  is  no  data  cell  A  =  1,  B  =  2.   This 
comment  means  that  no  data  card  with  A  =  1,  B  =  2  had  non-blank  data  for 
the  given  dependent  variable. 

3-2  Data  matrix  examples 

Class  A  design 

In  all  the  following  examples,  it  is  assiomed  data  is  stored  on 
sequential  file  number  1  (SEQUENTIAL  l) . 

Consider  a  two-way  design  with  three  subjects  in  each  cell.   For  the 
purpose  of  BALANOVA,  subjects  are  also  considered  to  be  a  factor,  the 
replication  factor.   Suppose  that  there  are  two  dependent  variables,  and 
further  that  the  factor  specification  cards  are  listed  in  the  order  given 
below,  following  the  main  program  card. 

BALANOVA (SEQUENTIAL  1 ' (3) (2) (2l (3) (3) • 
(0)(0). 

(i)(iMiU3^ 
(oUo). 

END  PROGRAM 

Note  that,  contrary  to  the  usual  case,  the  replication  is  the  second  factor. 
This  illustrates  one  flexible  feature  of  BALANOVA  5-  A  data  matrix  could  be 


1 

1 

1 

20 

19 

1 

2 

1 

8 

8 

1 

3 

1 

k 

k 

1 

k 

2 

-3 

6 

1 

5 

2 

k 

10 

1 

6 

2 

2 

3 

1 

7 

3 

k 

i+ 

1 

8 

3 

6 

2 

1 

9 

3 

8 

h 

2 

10 

1 

2 

7 

2 

11 

1 

-k 

8 
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2 

1? 

1 

25 

2 

2 

13 

2 

126 

15 

2 

11+ 

2 

2 

20 

2 

15 

2 

3 

3 

2 

16 

3 

U 

U 

2 

17 

3 

5 

-1 

2 

18 

3 

3 

2 

Note  that  the  first  column  is  the  A  level,  the  third  column  is  the  C  level 
and  the  second  column  is  the  replication  level,  which  in  Class  A  designs  can 
be  anything.   The  last  two  columns  are  the  dependent  variables.   Each  row 
of  the  data  matrix  would  be  punched  on  one  or  more  cards.  A  possible  format 
would  be  (3F5.0,3X,2F6.0). 

The  order  of  the  rows  is  immaterial.   They  could  be  in  any  order  and 
have  been  written  in  a  systematic  order  only  for  convenience. 

Class  B  design  -  repeated  measures 

The  data  in  Winer,  Table  7-2-3  could  be  analyzed  with  the  following  pro- 
gram, using  three  factors  and  one  dependent  variable. 

BALANOVA( SEQUENTIAL  1)  (3)  UH^)  (k)  (3)  . 

(0)(0>. 

(0)(0). 

END  PROGRAM 


Data  Card 

s: 

1 

1 

1 

0 

1 

2 

1 

0 

1 

3 

1 

5 

1 

h 

1 

3 

1 

1 

2 

3 

1 

2 

2 

1 

1 

3 

2 

5 

1 

h 

2 

k 

etc. 

2 

1 

5 

5 

2 

2 

5 

h 

2 

3 

5 

6 

2 

k 

5 

6 

2 

1 

6 

7 

2 

2 

6 

5 

2 

3 

6 

8 

2 

k 

6 

9 

Again,  the  rows  could  be  any  order.   The  subjects  in  the  second  level  of  facte 
A  could  be  assigned  level  numbers  1,  2,    3   or  any  other  thre  distinct  numbers. 
A  possible  format  for  this  matrix  is  (2F5.0,f6.0,1X,F7.0) . 

Another  repeated  measure  design  is  Winer,  Table  7 •^-3-  Here  we  have 
four  factors  and  one  dependent  variable. 


IV.BAL.19 


BALMOVA( SEQUENTIAL  l)  (h)  (l)  (2)  (2)  (3)  (U) 

(oUo)- 

(0)(0). 
(1)(1U1)(2). 
(0)(0). 
END   PROGRAM 


Data  Cards : 

1 

1 

1 

1 

18 

1 

1 

1 

2 

lU 

1 

1 

1 

3 

12 

1 

1 

1 

U 

6 

1 

1 

2 

1 

19 

1 

1 

2 

2 

12 

1 

1 

2 

3 

8 

1 

1 

2 

U 

k 

etc. 

1 

2 

6 

1 

18 

1 

2 

6 

2 

10 

1 

2 

6 

3 

5 

1 

2 

6 

k 

1 

2 

1 

7 

1 

16 

2 

1 

7 

2 

10 

2 

1 

7 

3 

8 

2 

1 

7 

k 

h 

etc. 

2 

2 

12 

1 

16 

2 

2 

12 

2 

12 

2 

2 

12 

3 

8 

2 

2 

12 

U 

8 

The  assignment  of  levels  to  factors  A,  B,  D  (columns  1,    2,    h)   has  to  be  as 
shown  above,  but  the  subjects  numbers  can  be  changed  provided  that,  for  each 
(A,B)  cell,  no  two  subjects  have  the  same  number.   A  possible  format  is 
(2F5.0,  DC,2F5.0,F12.0). 

Class  B  design  -  hierarchical 


able. 


Consider  the  following  design  with  three  factors  and  one  dependent  vari- 


BALANOVA(SEQUENTIALl)  (3^  (1^2)  (2)  {k) . 

(o)(o). 

(0)(0)(1^. 
(1)(1)(1)(2). 

END  PROGRAM 

This  is  a  hospitals  (factor  2)  within  drugs  (factor  1)  design,  illustrated 
below,  with  unequal  cell  size. 
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a  Card 

Is : 

1 

1 

1 

5-0 

1 

1 

2 

4.2 

1 

2 

1 

5.6 

1 

2 

2 

3.2 

1 

2 

3 

1+.6 

2 

1 

1 

5.3 

2 

1 

2 

8.2 

2 

1 

3 

U.3 

2 

1 

k 

6.3 

2 

2 

1 

5.7 

2 

2 

2 

6.8 

Note  that  the  hospitals  are  numbered  1,  2  in  each  level  of  factor  1  even  though 
there  are  four  different  hospitals  involved^   This  is  necessary  since  factor  2 
IS  a  non-replication  factor  -  see  the  rules  about  Data  Cards  in  Section  3.1. 
There  are  2  patients  in  hospital  1  for  drug  1,  3  patients  in  hospital  2  for 
drug  1,  h   patients  in  hospital  1  for  drug  2  ajid  2  patients  in  hospital  2  for 
drug  2.   The  patient  numbering  is  flexible  -  it  could  be  a  different  number 
for  every  patient,  regardless  of  hospital.   A  possible  format  is  (3F5.0,F7.1) . 

Class  C  design 

The  exajnple  in  Winer,  Table  4.3-1,  could  be  set  up  as  follows,  with  two  factors 
and  one  dependent  variable. 

BALANOVA( SEQUENTIAL  1)  (2Ul)  (5)  (U) . 

(1)(0). 

(0)(0). 

END  PROGRAM 


a  Cards: 

1 

1 

30 

1 

3 

16 

2 

1 

14 

2 

2 

18 

2 

3 

io 

2 

k 

22 

1 

2 

28 

3 

1 

24 

3 

2 

20 

3 

3 

18 

3 

k 

30 

h 

1 

38 

1 

h 

34 

h 

2 

34 

h 

3 

20 

k 

h 

44 

5 

k 

30 

5 

3 

14 

5 

2 

28 

5 

1 

26 
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The  rows  have  been  written  in  a  non- systematic  order  to  emphasis  that, 
without  exception,  in  BALMOVA  5  the  data  rows  can  be  in  any  order.  A 
possible  format  is  (2F5.0,F10.0) . 
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Chapter  h.      Program  Details 

k.l     Method  and  program  flow 

The  program  follows  the  procedures  in  Scheffe  (1959) >  Chapter  8.   Scheffe's 
discussion  will  not  be  repeated  here,  but  only  a  general  description  of  the 
program  flow  will  be  given.   The  names  of  the  subroutines  used  are  indicated 
in  case  reference  is  made  to  the  program  listing.  Many  of  the  minor  steps 

and  subroutines  are  not  described. 

1.  Main  Program 

Calls  subroutines  and  routes  your  data  through  the  program. 

2.  Design  input  and  check  (INPUTP) 

The  Factor  Specification  Cards  are  read  and  checked  for  errors.  Many 
of  the  error  conditions  mentioned  in  Section  U.3  are  checked  in  INPUTD. 
The  design  is  transformed  into  the  symbolic  notation  of  live,  dead  and 
absent  subscripts  as  in  Scheffe. 

3.  Derivation  of  all  legal  sources  (LEGALS,NEWS) 

All  possible  interactions  are  generated  but  only  one  interaction  with 
a  given  set  of  subscripts  is  retained.   The  procedure  is  identical  to  Scheffe, 
p.  277,  para.  1.   The  program  now  has  a  list  of  all  legal  sources  (including 
the  original  factors). 

h.      Expected  mean  squares  (AUXIL^EMS) 

The  expected  mean  squares  for  each  source  are,  of  course,  not  computable 
numbers,  but  rather  symbolic  expressions.   (cf.  last  column,  Table  8.2.2, 
Scheffe).   The  progrsun  generates  and  prints  these  expression  in  a  form  very 
close  to  the  normal  printed  form.   The  method  is  from  Scheffe,  pp.  28U-8. 

5-   Denominator  for  each  source  (FINDEN) 

By  the  standard  procedures,  using  the  expected  mean  squares,  the  program 
determines  the  correct  denominator  (if  any)  for  each  source. 

6.   Sorting  of  sources  for  summary  table  (SORT) 

The  sources  are  sorted  in  a  convenient  order,  combining  all  sources 
with  the  same  denominator.   This  order  is  then  used  in  printing  the  summary 
table . 

7-   Input  of  data  (INPUTX) 

The  input  data  is  read  from  the  input  device  (the  first  parameter  on  the 
main  program  card"!  .   The  grand  means  for  each  dependent  variable  are  computed, 
ignoring  missing  (blank "i  data. 
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8.   Storage  of  data  for  one  dependent  variable  (READX) 

This  routine  as  well  as  all  the  remaining  ones  are  executed  in  cycle 
once  for  each  dependent  variable.   READX  stored  the  data  in  core  and  checks 
that  no  data  is  missing  in  the  design.   The  data  is  actually  stored  as 
deviations  from  the  grand  mean.   This  is  done  to  improve  accuracy.   See 
Section  k.2. 

9-   Check  of  replication  numbers  (CELLN)^ 

In  the  case  of  Class  A  and  B  designs,  BALANOVA  5  checks  whether  the 
call  frequencies  are  equal,  proportional  or  non-proportional. 

10.  Computation  of  sum  of  squares  (SSEQU, SSPROP,XMEAN) 

The  marginal  means  and  sums  of  squares  for  each  legal  source  are 
calculated. 

11.  Computation  and  printing  of  final  summary  table  (FISHER, FPRINT) 
These  calculations  are  made  in  the  standard  way. 

k.2     Some  comments  on  accuracy  of  computation 

An  attempt  was  made  in  the  design  of  this  program  to  eliminate  the 
largest  sources  of  computational  inaccuracies  that  can  occur  in  analysis  of 
variance  calculations. 

Consider  a  one-way  analysis  with  the  following  data: 

Group  1  Group  2  Group  3 


Means 


77 
79 
80 
82 
82 


8 

88 

8 

90 

8 

90 

8 

91 

8 

91 

8.90 


8.96 
8.99 

9.00 

9.02 
9.03 

9.00 


Sums  of  squares  computations  are  generally  made  as  the  sums  and 
differences  of  two  or  more  terms.   In  this  example,  the  exact  calculations 
would  be 

SS  between  =  II88.25OO  -  II88.I5OO 
=  0.1000 

SS  within  =  1188.255^  -  II88.25OO 
=   0.005^+ 

Note  that  these  answers  each  have  a  string  of  zeros  following  the  given  digits 
since  they  are  exact.   However  on  a  computer,  with  about  8  digit  accuracy,  the 
differences  would  only  be  accurate  to  about  four  decimals  due  to  the  cancellation 
of  all  the  higher  order  digits  by  subtraction. 
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This  is  illustrated  by  a  calculation  using  the  previous  analysis  of 
variance  program  in  SOUPAC.   SOUPAC ' s  answers  were: 


SS  between  =  O.IOOOO6IO 
SS  within  -  0.0053863525 


(5  significant  digits) 
(2  significant  digits) 


Note  the  large  errors  in  these  SS.   Even  worse  errors  can  occur  in  other  data. 

The  whole  problem  could  be  avoided  by  accumulating  a  true  sum  of  square, 
that  is,  by  adding  positive  numbers  to  form  each  SS  rather  than  taking  a 
difference  of  two  large  numbers.   However  this  procedure  was  rejected  because 
it  is  extremely  slow. 

The  following  procedure  is  used  in  BALANOVA  5  ajid  it  is  very  effective. 
The  data  are  internally  transformed  to  deviations  from  the  grand  mean.   This 
is  why  the  grand  means  are  computed  in  subroutine  INPUTX  before  the  deviations 
are  actually  stored  in  the  memory  in  subroutine  READX.  When  the  deviations 
are  used  the  individual  terms  which  are  added  auid  subtracted  to  give  each 
SS  are  now  numbers  of  approximately  the  same  size  as  the  SS  itself.   This 
means  that  the  number  of  significant  digits  in  the  SS  is  large  even  if  the 
grand  mean  is  large.   In  the  example  given  above,  the  deviation  scores  are: 


Group  1 


13 
11 

10 
08 
08 


Group  2 


Means 


.02 

.00 

.00 

+ 

.01 

+ 

.01 

Group  3 

+  .06 
+  .09 
+  .10 
+  .12 
+  .12 


10 


.00 


+  .10 


and  the  SS  are  computed  as 


SS  between  =  0.10000000  -  0.00000000 
=  0.10000000 

SS  within  =  O.IO5UOOOO  -  0.10000000 
=  0.005^+0000 

The  actual  results  produced  by  BALANOVA  5  were 

SS  between  =  0.099999998  (8  significant  digits) 

SS  within  =  0.0053999992         (7  significant  digits) 

Note  the  great  improvement  in  accuracy. 

As  a  final  feature  of  BALANOVA  5?  the  approximate  n\miber  of  significant 
digits  in  each  SS  is  calculated  and  printed  alongside  each  SS.   These  numbers 
should  not  be  interpreted  exactly  but  only  as  a  warning  when  they  are  small. 
The  approximate  number  of  significant  digits  is  calculated  in  the  following  waj 
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(a)  Find  the  largest  term^  in  absolute  value,  entering  into  the 
calculation  of  the  SS.   In  the  example  above,  the  largest 
term  in  SS  within  is  the  first  term  (0.105^0000). 

(b)  Take  the  ratio  of  this  largest  term  to  the  SS  itself.   In  the 
example,  this  ratio  =  0.105^0000/0.005^0000. 

(c)  The  approximate  number  of  significant  digits  is  then  =  8.0-log-,Q 
(ratio).   In  the  example,  this  is  8.0-1. 38  =  6.62  which  is 
printed  by  BALANOVA  5  as  7^  a  pretty  good  estimate. 

Note  that  the  number  of  significant  digits  printed  by  BALANOVA  5  reflects 
the  loss  of  accuracy  in  the  computation  of  the  SS  from  two  or  more  terms. 
It  does  not  reflect  loss  of  accuracy  due  to  computation  of  the  terms  them- 
selves. 

^.3  Error  conditions 

BALANOVA  5  makes  a  detailed  check  to  insure  that  the  design  is  legal, 
that  none  of  the  computer  storage  arrays  are  exceeded  and  that  all  the  data 
corresponds  to  cells  within  the  specified  design.   The  following  general 
types  of  errors  are  distinguished  and  corresponding  error  messages  are 
printed  giving  detailed  instructions  about  how  to  correct  the  error. 

1.  One  of  the  restrictions  on  program  size  has  been  exceeded.   These 
restrictions  are: 

(a^   maximum  number  of  factors  =  10 

(b^   maximum  number  of  legal  sources  =  100 

(c)  maximum  size  of  X- storage  array  (used  for  data,  means  and 
cell  numbers)  depends  on  region 

(d)  maximum  number  of  dependent  variables  =  200 

(e)  maximum  number  of  sigma- squared  terms  in  any  one  expected 
mean  square  =  10 

2.  The  factor  specification  cards  are  incorrect  or  inconsistent. 
This  is,  the  design  is  illegal.   The  checks  made  are: 

(a)  all  nested  factors  must  be  listed  as  a  factor. 

(b)  no  factor  may  be  nested  within  itself. 

(c)  at  most  one  factor  can  be  the  replication  factor. 
Furthermore,  the  replication  factor  must  be  nested 
in  at  least  one  other  factor  and  no  factor  can  be 
nested  in  the  replication  factor. 
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(d)  the  factor  type  must  be  fixed  or  random. 

(e)  the  maximum  number  of  levels  for  each  factor  must  be 
more  than  one . 

(f)  there  must  be  at  least  one  denominator  term  in  the 
analysis  of  variance  summary  table.   If  this  is  not 

the  case  it  is  probably  due  to  no  factor  being  designated 
as  a  random  factor. 


3.  A  Data  Card  has  a  level  set  which  exceeds  the  limits  stated  in 
the  maximum  number  of  levels  on  the  Factor  Specification  Cards. 

h.      Once  the  data  for  a  dependent  variable  has  been  read  in,  a 
detailed  check  is  made  to  insure  that  all  cells  in  the  design  are  filled. 
If  one  cell  is  not,  the  calculation  for  that  design  is  deleted  and  the 
program  moves  on  to  the  next  dependent  variable  after  printing  sufficient 
information  for  the  user  to  locate  the  missing  datum.   An  additional  check 
for  Class  B  and  C  designs  is  made  to  insure  that  data  for  a  given  subscript] 
set  is  not  read  in  twice.   If  two  data  cards  specify  the  same  level  set, 
a  comment  is  made  to  this  effect  and  the  calculations  for  the  dependent 
variable  are  deleted.   Note  that  both  the  checks  mentioned  in  this  paragraph 
are  made  independently  for  each  dependent  variable  and  are  made  after  the 
missing  data  (blank  fields)  for  that  dependent  variable  have  been  deleted. 
Errors  referred  to  in  this  paragraph  are  not  fatal  and  the  program  proceeds 
to  the  next  dependent  variable. 

5.   About  one  dozen  other  checks  are  made.   They  should  always  be 
passed  satisfactorily  since  the  design  is  first  checked  as  above.   These 
additional  checks  were  inserted  to  assist  in  debugging  the  program  and 
if  one  of  them  fails  it  indicates  a  remaining  error  in  the  program.   A 
printed  message  is  made  to  this  effect  in  these  cases. 

k.k     Program  Checkout 

BALANOVA  5  has  been  checked  on  a  large  number  of  designs.  Among  these, 
the  following  calculations  were  reproduced  by  BALANOVA  5 : 

1.  Lindquist,  p.  266,  Class  A. 

2.  VJiner,  Table  7-8-3  (p-  376'>,  both  least-squares  and  unweighted 
means,  Class  B. 

k.^     Number  of  levels  of  the  replication  factor 

The  rules  in  Section  1.5  and  Chapter  3  are  strict  in  the  sense  that, 
if  they  are  followed,  BALANOVA  5  will  execute  correctly.   However  the  rules 
may  be  relaxed  or  ignored  in  the  case  of  the  number  of  levels  of  the  repli- 
cation factor  amd  it  is  sometimes  convenient  to  do  so. 

In  Class  A  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Main  Parameter       Card  provided  the  number  is  >  2.   This 
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is  so  "because  in  Class  A  designs  only  cell  means  are  stored  and  the  program 
does  not  check  the  replication  n\amber  anyway.  This  rule  relaxation  is  use- 
ful when  it  is  inconvenient  for  the  user  to  calculate  ahead  of  time  how 
many  subjects  are  in  each  cell. 

In  Class  B  designs,  any  number  of  levels  of  the  replication  factor  may 
be  punched  on  the  Factor  Specification  Card  provided 

(a)  the  number  is  -  the  maximum  number  of  replications  in 
any  one  nest ,  and 

(b)  the  number  is  not  so  large  that  the  restriction  on  the 
size  of  the  X  matrix  is  exceeded. 

Again  this  rule  relaxation  saves  the  user  from  having  to  know  the  maximum 
number  of  replications  before  using  BALANOVA  5^  provided  he  knows  an  upper 
limit.   The  restriction  on  the  size  of  the  X  matrix  will  not  often  be  ex- 
ceeded.  However,  the  user  has  not  been  informed,  in  this  manual,  how  to 
estimate  the  size  of  the  X  matrix  needed,  since  this  limit  is  complicated  to 
specify. 

k.6     Restrictions 

1.  No  latin-squares  designs  may  be  run  using  Balanova  5. 

2.  No  partially  crossed  designs  may  be  run. 

3.  One  and  only  one  replication  factor  is  allowed  but  one  may  use 
random  factors  as  needed  to  a  maximum  of  10  total  factors. 

h.     Missing  data  may  not  cause  a  cell  to  disappear. 

5.   Factor  levels  on  all  but  the  replication  factor  must  be 
numbered  1  through  the  maximum,  without  skips. 
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APPENDIX  A-KEY  TO  DESIGNS  IN  WINER  AND  LINDQUIST 


To  facilitate  the  use  of  Balanova  5  in  conjunction  with  Winer  and 
Lindquist,  a  large  number  of  designs  from  these  two  books  are  listed  below. 
All  designs  previously  described  in  Chapter  2  are  also  cross-referenced 

below  for  convenience. 


Winer,  Chapter  3 

See  Section  2.1.1. 

Winer,  Chapter  k,   pp.  111-116 
See  Section  2.3 

Winer,  Chapter  3,  pp.  18U-I8T 
See  Section  2.2.2. 

Winer,  Chapter  3,  PP.  I88-I89 


Class  A 


Class  C 


Class  B 


Class  B 


The  design  on  p.  I88  is  a  hierarchical  design  and  has  program  cards 
(for  n  =  15) : 

BALANOVA  ( C ) ( U  )  (l ) (2 ) (2) (2) (15) • 
(0). 

(o)(o)(i). 

(0)(0)(1)(2). 
(1)(1)(1)(2)(3). 
END  P 

The  design  on  the  top  of  p.  I89  is  identical  to  the  design  on  p.  I86 
(Table  5-12-1)  except  for  relabelling  of  factors.   See  Section  2.2.2. 

The  design  on  the  bottom  of  p.  I89  is  actually  a  repeated  measures 
design. 


Winer,  Chapter  6,  pp.  233-238 
See  Section  2.1.2. 

Winer,  Chapter  6,  pp.  2U1-2U1; 
See  Section  2.1.2. 

Winer,  Chapter  6,  pp.  252-257 


Class  A 


Class  A 


Class  A 


This  is  a  2  X  3  X  2  factorial  design.   Factor  A  is  educational  level 
(fixed).   Factor  B  is  training  method  (fixed).   Factor  C  is  instructor. 
Factor  C  may  be  consids3:^d  to  be  either  fixed  or  random.   See  the  discussion 
in  Winer,  p.  253.   Depending  on  the  final  choice  of  type  for  Factor  C,  the 
F  ratios  calculated  by  Balanova  5  will  differ.   Balanova  5  indicates  what 
denominator  was  used  for  each  F  ratio. 
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BALANOVA  (C) (U  )  (l ) (2) (3) (2) (lO) . 
(0).   FACTOR  A 
(0).   FACTOR  B 
(1).   FACTOR  C 
(1)(1)(1)(2)(3).   FACTOR  D 
END  P 

Note  the  SOUPAC  feature  that  program  cards  can  he  commented,  thus  the 
names  for  factors  used  hy  Balanova  5  are  given  as  comments  and  are  in  agree- 
ment with  the  notation  used  in  the  description  ahove. 


Winer,  Chapter  6,  pp.  283-28T 


Class  A 


Balanova   5  accepts  designs  vith  all  factors  having  two  levels.      Of  course, 
Balanova   5  uses   its  regular  computational  method  rather  than  any  special  tech- 
nique for  these  designs. 


Winer,   Chapter   6,   pp.    287-288 


Class  A 


The  Wulff  and  Stolurow  design  and  the  Gordon  design  would  hoth  have 
program  cards  similar  to  those  in  Section  2.1.   Note  that  Gordon  considered 
that  both  main  factors  were  random.   Balanova  5  vould  make  the  correct  F  tests, 
i.e.  the  main  effects  are  tested  against  the  interaction  mean  square  and  the 
interaction  is  tested  against  the  within  cell  (subject)  mean  square.   If 
Winer's  recommendation  is  followed,  however,  hoth  of  the  factors  should  he 
considered  fixed. 


Winer,  Chapter  6,  p.  289 

See  Section  2.3. 
Winer,  Chapter  6,  pp.  289-291 


Class  C 


Class  C 


The  last  three  designs  in  this  section  of  Winer  are  repeated  measures 
designs  hut  the  subject  factor  is  not  nested  in  any  factor  and  hence  the 
design  must  be  a  Class  C  design.   The  Bamford  and  Ritchie  design  has  program 
cards : 

BALANOVA  (C  )  (3 )  (l )  (3 )  (i+ )  ( 9)  • 

(0). 

(0). 

(1). 

END  P 


The  pooling  of  the  subject  interactions  would  have  to  be  done  by  hand. 

The  Geranthewohl  et  al  and  Jerison  studies  are  very  similar  escept  that 
the  subject  interactions  were  not  pooled. 


Winer,  Chapter  6,  pp.  291-297 


Class  A 


The  special  least  squares  computation  for  unequal  cell  frequencies  de- 
scribed in  Winer  cannot  be  performed  by  Balanova  5'   The  data  may  be  input  to 
Balanova  5,  however,  and  an  unweighted  means  analysis  will  be  performed.   The 
program  cards  are: 
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BAMNOVA  (C)(3)(1)(2)(U)(2T) 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 


Note  carefully,  though,  that  Winer  states  that  in  this  particular  case  the 

cell  frequencies  are,  in  a  real  sense,  an  integral  part  of  the  design  and 

a  least  squares  analysis  is  more  appropriate  than  an  unweighted  means  analysis. 


Winer,  Chapter  7 

See  Section  2.2.1. 
Lindquist,  Chapter  5,  p.  1^+5 


Class  B 


Class  C 


The  teatments  x  levels  design  with  one  observation  per  cell  is  a 
Class  C  design.   Suppose  there  are  U  treatments  and  10  levels.  Then  the 
program  cards  are: 

BALANOVA  (C) (2) (l ) (U ) (lO) . 
(0). 

(1). 
END  P 

The  levels  factor  must  be  random  for  there  to  be  a  test  of  the  treatment 
effect. 


Lindquist,  Chapter  5,  pp-  131-152 


Class  A 


Treatments  x  levels  design  (with  more  than  one  observation  per  cell) 
are  Class  A  designs  with  three  factors,  namely,  treatments,  levels  and 
subjects.  Parenthetically,  note  that  Lindquist 's  "levels"  refers  to  a  factor 
name  and  not  to  particular  values  of  this  factor,  called  "levels"  throughout 

this  ^iTriteup. 

The  exercise  on  pp.  151-152  has  the  program  cards: 

BALANOVA  (C  )  (3 )  (D  (2)  (3 )  (  5)  • 

(0). 

(0). 

(1)(1)(1)(2). 

END  P 

The  two  main  effects  are  test  score,  factor  i  and  level  of  ability,  factor  2. 
Subjects  (factor  3)  are  nested  within  these  two  factors  and  subjects  is  the 
replication  factor. 

Note  Lindquist's  comment  (p.  lUl)  that  the  levels  factor  is  not  a  random 
factor  and  hence  the  within  cell  mean  square  (MSj,__^^  ___^_)  is  the  correct  error 


term  for  the  test,  ability  and  test  x  ability 
calculations  agree  with  Lindquist. 


Subjects' 


sources.   Balanova  5's 
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Lindquist,  Chapter  6 


Class  C 


The  treatments  x  subjects  design  is  analyzed  in  exactly  the  same  way 
as  the  treatments  x  levels  deaign  with  one  olDservation  per  cell  as  described 
above  (Lindquist,  Chapter  5j  ?•  1^5) • 


Lindquist ,  Chapter  7 


Class  B 


The  groups  within  treatments  designs  discussed  in  this  chapter  of 
Lindquist  are  simply  hierarchical  designs  like  the  hospitals  within  drugs 
example  in  Section  2.2.  The  treatments  are  drugs  and  the  groups  are  hospitals 
See  the  program  cards  previously  given  in  Section  2.2. 

Lindquist  suggests  that  in  some  cases,  even  of  proportionality,  it  is 
desirable  to  use  the  unweighted  means  analysis.  This  can  be  done  by  the 
override  provided  by  main  parameter  ik .      Of  course  if  the  cell  frequencies 
are  not  proportional,  the  unweighted  means  will  be  performed  anyway  by  Bala- 
nova  5 • 


Lindquist  also  suggests  (p.  l82)  that  the  groups  may  be  considered  as 
random  samples  from  a  population  of  groups ,  in  which  case  the  group  factor 
should  be  listed  as  of  random  type  although  the  tests  will  not  actually 
change.   Here  is  another  case  where  the  test  is  the  same  even  though  the  in- 
terpretation of  the  results  will  differ  depending  on  what  assumptions  are 
made . 


Lindquist,  Chapter  8  Class  A 

BALANOVA  (C ) (3 ) (l ) (^ ) ( 5) (lO) . 
(O).   FACTOR  A 
(1).   FACTOR  B 
(l)(l)(l)(2).   FACTOR  C 
ENDP 

These  program  cards  would  be  fore  the  case  of  h   treatments  (Factor  A) ,  5 
replications  (Factor  B)  and  at  most  10  subjects  in  each  treatment  -  replication 
group.   There  are  5  x  i+  =  20  groups  in  all.   The  tests  that  Lindquist  suggests 
are  based  on  factor  B  being  a  random  factor.   These  tests' are: 

(a)  test  MS.  against  MS.^  (p.  I91) 

A  Ad 

(b)  test  MS.^  against  MS^  (p.  197) 

AB  L 

These  tests  would  be  automatically  performed  by  Balanova  5* 


To  further  comments  should  be  made.   Lindquist  suggests,  on  the  bottom 
of  p.  196,  that  unweighted  means  must  be  used  for  the  test  (a)  above  even 
in  the  case  of  proportional  designs.   If  this  advice  is  followed  the  special 
override  in  Balanova  5  must  be  used.   See  Main  parameter  lU .   Secondly,  the 
pooled  test  suggested  in  the  middle  of  p.  196  would  have  to  be  done  by  hand 
after  the  analysis  by  Balanova  5  had  been  completed.   See,  however,  Scheffe's 
chapter  on  mixed  models  for  a  detailed  discussion  of  these  problems. 
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Lindquist,  Chapter  9 


Class  A 


Chapter  9  contains  a  discussion  of  the  general  two-factor  design  of 

which  the  treatments  x  levels  design  is  Just  a  special  case.   The  examples 

are  similar  to  those  in  Winer  and  will  not  be  repeated.   Note  that  Lindquist 

p.  215,  discusses  the  possibility  that ,  in  a  design  with  factors  A  and  B,  if  A 

is  random,  MS   is  the  correct  denominator  for  testing  factor  B.   This  is,  of 
Ad 

course,  done  by  Balanova  5  if  factor  A  is  listed  as  of  random  type  in  the 

program  cards. 


Lindquist,  Chapter  10,  pp.  226-228 


Class  A 


See  Section  2.1.2. 


Lindquist,  Chapter  10,  pp.  230-237 


Class  A 


This  is  a  random  replications  of  a  two-factor  design  and  the  following 
program  cards: 

BALANOVA  (C ) (U ) (l ) (a) (b) (c ) (d) . 
(0) .   FACTOR  A 
( 0 ) .   FACTOR  B 
(1).   FACTOR  C 
(l)(l)(l)(2)(3).   FACTOR  D 
END  P 


Note  that  both  factors  C 
and  C 


and  D  are  random.  The  number  of  levels  for  A,  B 

,  b  and  c  which  would  be  integers  in  an  actual  example.  The  niimber 


are  a 

of  levels  for  D  (d  in  the  above  example)  would  be  the  maximum  number  of 

replications  in  any  one  cell.   The  tests  given  in  Lindquist  will  be  carried 

out  by  Balanova  5.   That  is,  the  interaction  AB  will  be  tested  against  MS 

ABC,  AC  and  BC  will  be  tested  against  MS  .^,  .   =  MS^,  A  is  tested  against 

,  ^     .    .  ^  within     D 

MS   and  B  against  MS 


BC 


Lindquist,  Chapter  10,  pp.  237-238 


Class  C 


The  treatments  x  treatments  x  subjects  design  is  a  Class  C  design. 
is  identical  to  the  Bamford  and  Ritchie  design  described  above  (Winer, 
Chapter  6,  pp.  289-291). 


It 


Lindquist,  Chapter  10,  pp.  238-2^3 


Class  A 


The  designs  on  these  pages  are  similar  to  designs  discussed  previously. 
Care  must  be  taken  to  decide  which  factors,  if  any,  are  random.   The  computed 
mean  squares  are  not  affected  by  this  choice,  but  the  choice  of  denominators 

is . 


Lindquist,  Chapter  13 


Class  B 


Three  of  Lindquist 's  designs.  Types  I,  III  and  VI,  are  acceptable  to 
Balanova  $  as  Class  B  designs. 
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Type  I  has  program  cards : 

BALANOVA  (C ) (3 ) (l ) (U ) (3) (lO) . 

(0). 

(0). 

(1)(1)(2). 

END  P 

This  table  corresponds  to  the  chart  on  the  top  of  p.  268  vhere  the  maximum 
of  n  ,  n  ,  and  n_  is  10. 

Type  III  has  program  cards: 

BALANOVA  (C  )  (i| )  (l )  (U  )  (2)  (3)  (^O)  . 

(0).   FACTOR  A 

(0).   FACTOR  B 

(0).   FACTOR  C 

(l)(l)(2)(3).   FACTOR  D 

END  P 

This  tahle  corresponds  to  the  chart  on  p.  282  where  the  maximum  number  of 
replications  in  a  BC  cell  is  kO. 

Type  VI  has  program  cards : 

BALANOVA  (C)(It){l)(2)(3)(ii)(lO). 

(0).     A 

(0).     B 

(0).     C 

(l)(l){3).   D   SUBJECTS 

END  P 


This  table  corresponds  to  the  chart  on  p.  292  where  the  maximum  nimiber  of 
replications  in  any  level  of  C  is  10. 
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APPENDIX  B-QUICK  REVIEW  OF  PARAMETER  CARDS 

BALMOVA  5 


Parameter 
Number 

1 

2 

3 

U-13 

15 
Subparameters 

1 

2 

3-11 
END  PROGRAM  is  required, 


Description 

Input  address.   CARDS,  SEQUENTIAL  1-15. 

Number  of  factors. 

Number  of  dependent  variables. 

Number  of  levels  of  factors. 

1  if  desire  unweighted  means. 

1  to  suppress  means. 

1  if  random  factor. 

1  if  is  replication  factor. 

Factors  in  which  this  factor  is  nested, 


CLASSIFICATION 


I.  General  Description 

The  CLASSIFICATION  program  is  designed  to  measure  individuals  against 
previously  determined  groups  in  order  to  determine  probable  group  member- 
ship. The  classification  is  done  in  a  reduced  test  space  derived  from 
discriminant  analysis.   The  method  of  classification  is  based  on  the  premise 
that  a  group  is  totally  described  by  its  mean  (or  centroid)  and  dispersion; 
the  individual's  relation  to  each  group  is  determined  by  a  y^   which  indicates 
how  many  members  of  the  group  are  farther  from  the  centroid  than  he  and  a 
Bayesian  probability  of  membership  in  the  group  based  on  this  y^ .      For  each 
individual,  the  x  ^-^d  probability  for  each  group  are  given;  the  user  then 
applies  a  decision  rule  of  his  choice  for  assigning  individuals  to  groups. 

Since  analysis  is  to  be  performed  in  a  reduced  space,  the  means  and  dis- 
persion of  each  group  must  also  be  reduced  to  this  space.   The  CLASSIFICATION 
program  performs  these  reductions. 

The  calculation  of  probabilities  requires  the  specification  of  the  number 
of  members  in  each  group  against  which  the  individual  is  being  compared. 
They  may  be  the  numbers  actually  in  the  groups  used  for  finding  experimental 
means  and  standard  deviations  or  the  number  of  individuals  from  the  total 
group  being  tested  who  are  to  be  assigned  to  each  group.   These  numbers  are 
specified  along  with  the  number  of  discriminant  functions  as  an  input  vector. 

In  every  case  the  input  is  expected  in  the  form  in  which  it  is  output  by 
the  DISCRIMINANT  AIJALYSIS  program. 

I.   Formulas  and  Calculations 

The  following  formulas  are  given  in  terms  of  matrix  arithmetic  except  for 
divisions  among  singletons.   Dimensions  i,  j,  k  are  respectively  for  variables, 
functions  and  groups. 


V  (with  dimensions  i,  j  ) 


X  (with  dimensions  k,  i): 


discriminant  vegtors  (input)  for  i  variables 
and  j  functions 

group  means  for  k  groups  (input) 


C  (with  dimensions  k,  j )  =  X  •  V:   centroids  in  reduced  space.   Let  C  (vector 

dimensioned  j )  be  C  for  one  group. 

The  following  are  calculated  for  each  group,  k: 

D  (with  dimensions  i,i):  dispersion  matrix  for  group  k  (input) 

D  (with  dimensions  j,j)  =  V'-D-V;  reduced  dispersion  matrix  for  group  k 


^-1 


D   (with  dimensions  j,j) 


inverse  of  reduced  dispersion  matrix  for 
group  k 


G  (singleton):  group  sample  size  (input) 

R  (singleton)  =  G/determinant  D:   ratio  for  group  k 
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The  following  are  calculated  for  each  subject  of  raw  data  entered: 
S  (vector  dimensioned  i):  row  of  data  for  a  subject  (input) 

For  each  group  k  and  each  subject 

where  d  =  S 


2     ^-1 
X  =  d'D  d 


P^  =  R 


-x2/2 


V  -  c 


where  e  is  the  base  of  the  natural 
logarithm 


III 


Then  for  the  subject  Pp  ~  ?  ^-i  (^) 

Probability  of  membership  in  group  k:   p(k)  =  P   (k)/P 
Parameters 


The  program  name  CLASSIFICATION  is  followed  by  these  parameters  on  the 

program  call  card: 


Parameter 

Number 


Description 

Input  Address  of  discriminant  vectors,  V.   The 
vectors  are  expected  as  columns.   CARDS  or 
SEQUENTIAL  1-15- 

Input  Address  of  means,  X.   The  means  for  a 
given  group  are  expected  as  a  row.   CARDS  or 
SEQUENTIAL  1-15- 

Input  Address  of  dispersion  matrices,  D.   The 
dispersion  matrices  are  expected  in  a  vertically 
augmented  form.   CARDS  or  SEQUENTAL  1-15. 

Input  Address  of  individual  scores,  S.   CARDS 

or  SEQUENTIAL  1-15- 

Output  Address  for  inverse  of  dispersion  matrix. 
PRINT  or  SEQUENTIAL  1-15- 

2 
Output  Address  for  x  and  probabilities  for  each 

subject.   SEQUENTIAL  1-15  and/or  PRINT.  (Out- 
put on  SEQUENTIAL  is  in  the  form  N.  ,  X X 


•il 


.  .  .Y. 
m 


where 


^il' 


m, 


N.  =  sequential  subject  number  in  groups 
X.  .  =  j'th  probabilities  for  i'*'^  subject 


Y^"^.  =  j^h  y2   for  i'th  subject) 
Number  of  groups. 
Number  of  variables. 
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Parameter 
Number 


Description 

Input  Address  of  number  of  discriminant  functions, 
and  group  sample  sizes,  G,  as  a  single  row.  CARDS 
or  SEQUENTIAL  1-15-* 


^  Output  may  be  punched  directly ■ 
and  output . 


See  SOUPAC  Manual,  section  of  input 


*  Must  be  a  row  vector  (l  observation)  with  the  number  of  discriminant 
functions  as  the  first  variable.   The  form  output  by  DISCRIMINANT  ANALYSIS 
program  may  be  used.   If  coming  from  CARDS,  this  data  deck  should  be  first, 
followed  by  any  other  card  decks  in  the  order  listed  in  the  parameters. 

IV.   References 

Cooley,  William  ¥.  and  Lohnes,  Paul  R. ,  Multivariate  Procedures  for  the 
Behavioral  Sciences,  Chapter  7»  John  Wiley  &  Sons,  Inc.,  New  York,  I962. 
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DISCRIMINANT  ANALYSIS 


I.   GENERAL  DESCRIPTION 


Suppose  that  vre  have  k  populations  (groups)  and  p  measures  (variables) 
on  each  member  of  each  population.  We  want  to  test  the  hypothesis  that  our 
groups  are  significantly  different  on  the  entire  set  of  variables.  This 
one-way  miiltivariate  analysis  of  variance  hypothesis  is  tested  by  this 
program.   The  program  then  locates  the  dimensions  (discriminant  functions) 
along  which  the  group  differences  are  maximum.   Thus,  we  need  some 
function  to  transform  the  p  variates  into  a  smaller  set  of  independent 
measures  which  will  indicate  the  differences  between  the  groups.   The 
DISCRIMINANT  ANALYSIS  program  finds  the  independent  lirear  fimctions  of 
the  variables  which  maximally  discriminate  between  "■   populations  (groups 
input).   The  results  from  this  program,  namely  the    criminant  functions, 
may  be  used  in  the  CLASSIFICATION  program  to  determi.  .  the  probability  that 
any  subject  belongs  in  any  group.  Also,  by  looking  at  the  coefficients  of 
the  functions,  we  can  determine  to  what  extent  each  of  the  p  variates 
contributes  to  each  ftmction.   In  order  to  do  this  we  need  to  determine  the 
coefficients  of  the  functions  such  that  the  ratio  of  variances  between 
groups  to  the  variances  within  groups  is  maximized,  i.e.  the  differences 
between  groups  are  to  be  large  relative  to  the  differences  within  groups. 

II.   CALCULATIONS  AND  FORMULAS 


In  matrix  terms,  we  are  trying  to  maximize  the  ratio 


f.  '  Af. 
1    1 


f .  'Wf . 
1   1 

"f"Vi  1 

where  f .  is  the  eigenvector  associated  with  the  i   eigenvalue  Xi   of  W   A, 
A  =  the  covariance  matrix  between  means. 


a.  .  =  E  N   (X.   -  X.)  (X.   -  X.) 
ij    -L  g   ig    1    Jg    0 


and  W  =  the  covariance  matrix  within  classes, 

w.  .  -  E  [  Z   (X.    -  X.  )  (X.   -  X.  )] 
^^   g=l  n=l   ^^    ^^    ^^"    ^^ 

where  k  =  number  of  groups,  Ng.=  number  of  subjects  in  group  g,  N  =  total 
nimiber  of  subjects,  and  i  and  j  rim  from  1  to  p,  where  p  =  number  of 
variables. 


To  find  the  maximum,  we  derive  from  the  partial  derivatives  of  that 
ratio,  the  matrix  equation 


-1 


(W   A  -  XI )  F 


0 


where  F  is  the  matrix  of  eigenvectors.   The  eigenvectors  are  the  coefficients 
of  the  discriminant  functions.   The  relative  sizes  of  the  eigenvalues 
indicate  the  extent  to  which  the  associated  discriminant  functions  distinguish 


IV.DIS.2 


among  the  groups.   The  percentage  of  the  total  discriminating  power  of 
the  variables  contained  in  the  j'^h  discriminant  function  is  represented  by 


100( 


A 


i 


) 


.Z^  A. 
1=1  1 


(N  should  be  the  smaller 
of  k-1  and  p) 


In  addition  to  obtaining  the  eigenvalues  and  discriminating  coefficients, 
the  program  will  compute  scaled  vectors  to  show  the  relative  contributions 
of  the  variables  to  the  discriminant  function  by 


f.  .' 


=  (w^J 


1/2 


ij 


I 


III.   INPUT  DATA 


Input  to  the  DISCRIMINANT  ANALYSIS  program  consists  of  two  or  more 
data  groups.   Each  data  group  consists  of  a  set  of  observations  on  two  or 
more  variables.   All  of  the  groups  must  contain  observations  on  the  same 
set  of  variables.   The  groups  may  be  input  as  separate  card  decks  (each 
preceded  by  a  DATA  format  card  and  followed  by  an  END#  card),  as  data 
groups  located  on  separate  temporary  storage  areas,  or  as  a  mixture  of 
data  groups  on  card  decks  and  data  decks  on  temporary  storage  areas.   See 
section  of  examples. 

The  discriminant  functions  can  be  computed  either  from  raw  data  or 
from  W  and  T  matrices,  where  each  is  a  separate  card  deck  or  input  file. 
See  Output  section  for  description  of  W  and  T. 

IV.   SIGNIFICANCE  TESTS 

The  measure  of  significance  calculated  in  the  DISCRIMINANT  ANALYSIS 
program  is  a  Wilks '  lambda  (likelihood  ratio  test  statistic).  This  is  a 
test  of  the  discriminating  power  of  the  test  battery.   It  tests  the  hypoth- 
esis that  the  population  centroids  (mean  vectors)  are  equal  for  the  k 
groups.   The  Wilk's  lambda  is  a  function  of  the  roots  of  W^A  and  is  of 
the  following  form: 


A  =.n, 

1=1 


1+A. 


where  r  is  the  lesser  of  k-1  and  p, 
defined  in  the  following  manner t 


In  matrix  terms  this  criterion  is 


A= 


W 


where  |W|  and  |t| 
determinants 


are 


W  is  the  pooled  within  groups  deviation  score  cross-products  and  T 
is  the  total  sample  deviation  cross  products  matrix.  As  |t|  increases 
relative  to  |w|  the  ratio  decreases  in  value  with  an  accompanying  increase 
in  the  confidence  that  the  group  centroids  are  not  equal. 
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An  F  ratio  which  yields  an  approximate  test  of  the  significance  of 
the  Wilks'  lambda  is  calculated  and  printed. 


^   y   ^  ^    2r   ^ 


where  s  =  /  ,  2  2  T  ^ ,  /  2  _^   2  ^  v 
(p  q  -4  )/(p  +  q  -5  ) 

m=n-  (p+q+  l)/2 
X  =  -(pq  -  2)/h 
r  =  pq/2 


q  =  k  -  1 

n  =  N  -  1 

N  =  total  number  of  subjects 

k  =  number  of  groups 

p  =  number  of  variables 


The  degrees  of  freedom  to  be  used  with  the  F  value  printed  in  the  output 
are  printed  and  are  labeled  Fl  (degrees  of  freedom  for  the  numerator) 
and  F2  (degrees  of  freedom  for  the  denominator)  and  equal  2r  and 
mx  +  2A ,  respectively. 

V .   OUTPUT 

The  output  consists  of  the  following: 

1.  Means  of  input  variables  for  each  group  and  group  sample  size 
(Parameter  Number  8) 

2.  A  dispersion  matrix, for  each  group.   (Parameter  Number  9) 

3.  The  total  sample  deviation  score  cross-products  matrix 


N        _        _ 

t.  .  =  Z   (X.   -  X. )(X.   -  X.) 
ij   ^^^   m    1   jn    J 

where  i  and  j  range  over  the  variables.   This  matrix  is  the  sum 
of  the  A  and  ¥  matrix  described  in  section  I.   This  is  the  T  matrix 
referred  to  in  Parameter  Number  7.   The  diagonal  of  this  matrix 
contains  the  sums  of  squares.   (Parameter  Number  6) 

h.      The  pooled  within-groups  deviation  scores  cross-products  matrix  which 
is  labeled  W  on  the  output.   (Parameter  Number  6) 

5.  The  total  number  of  subjects  in  all  the  groups  combined.   (Parameter 
Number  6 ) 

6.  The  means  and  standard  deviations  of  the  variables  across  all 
groups.   (Parameter  Number  6) 

7.  The  correlation  matrix  of  variables  over  all  groups.   (Parameter 
NiJiiiber  6 ) 
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9. 

10. 

11. 

12. 


1 


The  among  groups  cross-products  of  deviations  of  group  mean  from  grai  ] 
means  weighted  by  group  sizes.   This  matrix  is  labeled  A  matrix  on 
the  output.   (Parameter  Number  6) 


The  eigenvalues  for  the  W   A  matrix.     (Printed) 

The  eigenvalues  and  percentage  of  variance  explained  by  each 
additional  eigenvalue.   (Printed  on  output,  automatically) 

The  trace  of  the  W   A  matrix.   This  is  the  sum  of  the  eigenvalues. 
(Printed  on  output,  automatically) 

The  discriminant  functions  (fj^^).   The  number  of  discriminant 
function  will  equal  r  where  r  is  the  lesser  of  the  two  values 
k-1  and  p,  where  k  =  the  number  of  groups  and  p  =  number  of 
variables.   (Parameter  Number  l) 


r 


13.   The  group  means  on  the  discriminant  fimctions.   This  is  a 

k  X  r  matrix  formed  by  multiplying  the  group  means  on  variables 
and  the  discriminant  functions.   The  matrix  may  be  used  to  determine 
the  relative  positions  of  the  groups  on  the  derived  function. 
(Parameter  Number  11) 

lU.   The  scaled  vectors.   These  vectors  are  formed  by  multiplying  the 
discriminant  functions  by  the  square  roots  of  the  diagonal  of 
the  W  matrix  described  above.   The  scaled  vectors  show  the 
relative  contributions  of  the  input  variables  to  each  of  the 
discriminant  functions.   (Parameter  Number  5) 

15.   The  measures  of  significance  described  in  Section  IV.   (Parameter 

Nmnber  1+ ) 


VI .   RESTRICTIONS 


If  raw  data  is  input  from  cards,  each  group  should  be  preceded  by 
a  DATA  card  and  concluded  with  an  END#  card  in  accordance  with 
SOUPAC  conventions.   If  coming  from  seq_uentials  data  must  be  on 
separate  sequential  files. 

The  number  of  subjects  in  a  group  must  be  greater  than  or  equal 
to  the  number  of  variables. 


VII 


3.   The  W  matrix  may  not  be  singular,  that  is  deviations  from  group 
means  may  not  be  linearly  dependent. 

h.      Variables  may  not  be  constant  within  a  group. 

PARAl^ffiTERS 


The  DISCRIMINANT  ANALYSIS  program  follows  the  program  name  on  the  main 
program  card.   Each  parameter  must  be  enclosed  in  parentheses.   The  param- 
eters must  appear  in  the  order  given  below.   If  a  parameter  is  not  needed, 
do  not  punch  anything  between  its  parentheses.   All  parentheses  after  the 
last  non-empty  pair  may  be  omitted. 


IV.DIS.5 


Parameter 
Number 


10 


11 


12 


Description 

$1  Output  address  of  discriminant  functions 

(Matrix  f.,).   SEQUENTIAL  1-15  and/or  PRINT. 
^^      (Needed  for  CLASSIFICATION.) 

Number  of  variables 

Number  of  groups 

1  if  desire  significance  measures  printed. 

1  if  desire  scaled  discriminant  vectors  printed. 

1  if  desire  intermediate  res\ilts  printed. 

1  if  input  is  W  and  T  matrices  instead  of  raw  data, 

Output  address  of  group  means  on  original  variable 
and  sample  size  (printed  only).*  SEQUENTIAL  1-15 
and/or  PRINT.   (See  Parameter  12)   (Needed  for 
CLASSIFICATION). 

Q,     Output  address  of  group  dispersion  matrices  of 
original  variables.*   SEQUENTIAL  1-15  and/or 
PRINT.   (Needed  for  CLASSIFICATION.) 

N  -  total  nijmber  of  subje.ts  in  all  groups  combined. 
This  parameter  is  left  blank  if  raw  data  is  in- 
put rather  than  W  and  T  matrices. 

Q.     Output  address  of  group  means  on  discriminant 
functions.   SEQUENTIAL  1-15  and/or  PRINT. 

Output  address  of  group  sample  sizes,  needed  for 
CLASSIFICATION. 


*    If  W  and  T  are  input  instead  of  raw  data,  group  means  and  dispersion 
matrices  are  not  printed.   Means  and  dispersion  matrices  on  discriminant 
functions  are  not  computed  in  this  case. 

'^  It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 

parameters.   If  you  need  either  of  these  options,  see  the  section  in  the 
INTRODUCTION  on  INPUT  and  OUTPUT. 

VIII.   INPUT  PROCEDURE 

Input  addresses  of  raw  data  groups  or  W  and  T  matrices  are  listed  on  a 
$INPUT  card  following  the  main  parameter  card.   W  precedes  T  in  the  sequence 
when  these  are  input.   W  and  T  may  be  output  using  a  $OUTPUT  card.   Output 
order  is  the  same  as  input. 


.»i^SBtSiS»: 
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If  more  rav  data  input  groups  are  specified  (parameter  3)  than 
addresses  on  the  $INPUT  card,  the  last  address  specified  is  reused  as 
many  times  as  needed  to  provide  that  (parameter  3)  numher  of  groups.   This 
is  especially  valuable  if  several  decks  of  cards  are  input  consecutively. 

IX.  SPECIAL  COMMENTS 

1.  This  program  does  not  check  for  missing  data.   All  "blank  spaces 
are  read  as  zeros. 

2.  The  user  is  cautioned  against  using  the  DISCRIMINANT  AI^ALYSIS 
program  without  an  understanding  of  the  statistical  technique 
used.   See  section  of  references. 

3.  Discriminant  scores  may  be  taken  by  matrix  multiplication  of  raw 
data  X  discriminant  functions  (pairameter  l). 

X .  EXAMPLES 


i 


A 


B 


/*ID 

//   EXEC   SOUP 

//   SYSIN  DD  * 

DIS(S1/P)(33)(M(1)(  )(  )(!)(  )( 

$INPUT(C)(C). 

END  SOUP 

DATA(20)(lOX,5ElU.T) 

(Data  for  W) 

END# 
DATA(20)(lOX,5Eli^.T) 

(Data  for  T) 

END# 
/* 


/*ID 

//   EXEC  SOUP 

//SYSIN  DD   * 

MATRIX.  DIS(S1/P)(33)(M(1)(  )(  )(l)(  )(  )(96). 

M0V(C)(S5). 

END  P 

DIS(S1/P)(U0)(U)(1)()()()(P) 

$INP(C)(C)(S5)(C). 

END  SOUPAC 

DATA(U0)(il0F2.0) 

:   (1st  Data  Deck) 
END# 
DATA(U0)(U0F2.0) 

:   (2nd  Data  Deck) 
END# 
DATA(U0)(U0F2.  O) 

(3rd  Data  Deck) 
END# 
DATA(ll0)(U0F2.0) 

(Uth  Data  Deck) 

END# 
/* 

In  Example  A,  four  groups  of  data  are  being  input  with  the  first  two 
groups  coming  from  cards,  the  third  from  temporary  storage  on  S5  and  the 
fourth  from  cards.   Discriminant  functions  are  stored  on  SI  and  printed 
and  group  means  are  printed.   Significance  measures  are  calculated  for  hO 
variables  input.   In  example  B,  W  and  T  matrices  are  input  and  much  the 
same  results  are  obtained  as  for  example  A.   Group  means  cannot  be  calcu- 
lated. 
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/*ID   (accounting  information  ) 

//  EXEC  SOUPAC 

//SYSIN  DD  ^ 

DIS(S1)(15)(2)()()()()(S2)(S3)()()(SU) 

$INP(C)(C). 

CLA(Sl) (S2)(S3)(C) (?)(?) (2)(15)(SU). 

END  S 

DATA(15)( format ) 

(1st  data  deck) 

END# 

DATA(15)( format ) 

I   (2nd  data  deck) 

END# 

DATA(15)( format ) 


(3rd  data  deck) 


END# 


This  illustrates  the  use  of  the  DISCRIMINANT  and  CLASSIFICATION  programs. 
The  DISCRIMINANT  program  will  save  discriminant  functions  on  SI;  it  will 
operate  on  15  variables  for  each  of  two  groups.   It  will  output  group  means 
on  S2  and  group'  dispersion  matrices  on  S3  and  store  group  sample  sizes  on  Sh . 

CLASSIFICATION,  in  turn,  will  read  discriminant  functions,  group  means, 
and  group  dispersion  matrices  from  SI,  S2,  and  S3  respectively.   It  will  read 
the  group  to  be  classified  from  cards  and  print  inverse  of  dispersion  matrices 
and  x^  and  probability.   It  will  expect  two  sets  of  group  means  and  dispersion 
matrices  from  15  variables  but  1  discriminant  function  (See  Section  V). 
Original  group  sample  sizes  are  read  from  S^^. 
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T-TEST 


I.   General  Description 


The  T-TEST  program  calculates  a  T  coefficient  or  F  ratio  as  described 
below: 

Suboperation  (l):   Paired  T-Test  (also  called  correlated  T-Test). 

Variables  are  in  a  row,  variable  ore  is  paired  with  variable  two,  three  with 

four,  etc.,  smd  a  paired  T  coefficient  is  calculated  for  each  pair  as 
follows : 

t  =  d/s 


_       N 
d  =   Z      [X 
i=l 


.th 


-  X/>]/N.    ,  where     a     and  g     are  the  j        pair  of  variables 


S_=N        e"       [X^     -^6^^     ■      ^^      ^^0     -   Xg])2/N   2.f] 
d  '^    i=l  i=l  '^ 


1/2 


where  f  =  degrees  of  freedom  =  N  or  N.  -1  as  desired  and  N  is  the  sample 

size  for  the  j   pair  of  variables. 

Suboperation  (2):   Paired  T-Test  for  all  possible  combinations  of 
variables  computed  as  in  Suboperation  (l). 

Suboperation  (3)-   Test  of  differences  from  a  known  population  mean. 
A  population  mean  must  be  provided  for  each  column  of  data  or  the 
mean  will  be  set  to  zero.   Population  means  should  be  provided  as  a 
row  vector.   The  following  are  calculated  and  printed  for  each  vari- 
able: 


t  value:   t  =  (X  -y  )/S_ 

X 

where  y  =  parameterized  value  or  zero 


Mean; 


X  =  E'  X./N 
i=l  ^ 


NOTE 


N  ZX^  -  (EX. )^ 
Standard  Deviation:   S.D.  =( ) 

N  x(d.f. ).   ' 

1 

N-1  is  the  usual  degrees  of  freedom,  but  N  may  be  specified. 

S.D. 
Standard  Error  of  Mean:   S_  =  

X 


Suboperation  (4):   Test  of  differences_  from  a  known  population  mean  for 
previously  analyzed  data:   the  mean  (X),  standard  deviation  (S.  D.),  and 
sample  size  (N)  as  well  as  the  population  mean  [see  suboperation  (3)]  are 
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read  in  and  the  program  computes  the  standard  error  of  the  mean  and  T 
for  each  trio  of  the  X,  S.D.,  and  N  read  in  that  order.   If  R  trios  of 
data  are  used  (R  observations  on  rows  occur)  then  R  population  means 
should  be  given.   Calculations  are  the  same  as  in  Suboperation  (3) • 

NOTE:  A  column  of  data  of  Suboperation  (3)  is  reduced  to  a  three  item 
row  here. 


Suboperation  (^) :   Test  of  differences  between  two  or  more  group  means 

taken  pairwise:   each   group  is  located  on  a  separate  storage  location 

or  set  of  cards.   The  following  are  calculated  for  one  variable,  two 

groups : 

t  value:   t  =  (X.  -  X.)/S  i  and  j  are  two  groups 

1    .1  '  - 


-    N. 

Mean:   X.  =  Z,  X..  /N. 
1   k=l   ik   1 


Xi  -  x_^ 

for  the  variable  over  group  i,  Nj^  is 
sample  size  of  group  i 

Standard  Deviation:   (SD).  =    k=l  ik     R=l  ik 

^        N.  X  (d.f . ). 
where  (d.  f .  )  is  N-j^  or      -i— 
Ni  -  1 

Pooled  Estimate  of  Variance:   S  =  [(SD)   +  (SD  )]/(d.f.) 

-'-  J 

where  (d.f.)  is  N  +  N  or  N .  +  N .  -  2  for  N  or  N-1  respectively 
i    J      1    J 


Estimate  of  Standard  Error 


S-      -      = 
x.-x. 
1      J 


'"s2(N^   +  Nj)' 


N.    N, 
_     1      J  — . 


I 


Non-pooled  Estimate  of  Standard  Error: 

S     =    [(S  D    )./(N.2-  1)   +    (S  D    )./(N.2-  l)]^/2 
n  11  JO 

Suboperation  (6) :  One-way  analysis  of  variance.   The  data  to  be  compared 
are  located  on  different  storage  units  or  sets  of  cards.   Calculations  are 
made  as  follows  for  each  variable : 


S  =  number  of  storage  units  =  number  of  subgroups 

N.  =  niimber  of  observ 
each  i,  j=l, .,  Nj_ 


N.  =  niimber  of  observations  in  i   subgroup,  i=l, .,  S,  and  for 


X.  .  =  element  in  the  j^^  row  of  the  i   subgroup 


N  =  2  N.  =  total  observations 
i=l  ^ 


S   Nj_  S    Ni      2 

Total  SS  =  Z   2^  Xnn    -  (^    E   X^^)  /N 


i=l  j=l 


ij 


i=l  D=l 


ij 


:¥aSS3 
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Ni 


Betveen  SS  =  .S  r(.S^  X.  J  /N.  ]-  (.E,  .E^  X..)  /N 
1=1  j=l   ij    1    1=1  j=l   ij 


Within  SS  =  Total  SS  -  Between  SS 

Within  D.F.  =  Total  D.F.  -  Between  D.F. 

BSS 
BDF 

WSS 
WDF 

Suboperati,on  (7):   One  way  analysis  of  covariance.   The  experimental  (depen- 
dent) variable  comes  first  followed  by  1  or  more  covariates.   The  dependent 
■/ariable  is  adjusted  to  the  set  of  covariates,  not  iteratively  to  one  covariate 
at  a  time.   Subgroups  or  factor  levels  are  handled  as  in  analysis  of  variance, 
Suboperation  (6),  i.e.  as  separate  input  decks  or  temporary  storage  locations. 
Statistics   obtained  are  means  and  standard  deviations,  test  of  homogeneity 
of  regression,  F-ratio  for  covariance,  and  adjustment  coefficients.   For 
further  discussion  see  Winer,  p.  578.ff. 


c.        =ExX       -    Z     X        Ex 
"^ijk        N      ^ik  jk        \   -^ik  Ny      jk 

k 


y    X. 


\ 


A, 


m-. 


deviation  crossproducts   for   all  variables 
X  in  group  k   (including   experimental),    X       and 
X       are  two  variables,  N     is  number  of   subjects, 
I     is   summation  over  group 
k 

where   c.  .,     is  an  element   of  C,  ,   y  is   ex- 
ijk  k     '^ 

perimental  variable,   x^  , . . .x     are  covariates 

1     m 


1  =  I^\-K  \'  \' 

=  unpooled  within  group  sum  of  squares 


u'oup  mean:   X.,  =  II     X.   for  group  k 
^  ik   N,   1 

k 


/~c~- — 

Group  standard  deviation:   s.,  =  v  i2: 

^^     (d.f.), 


. .  =  1  X.X.  -  ^   X.  ^   X. 

1,1    *  1  J    M   1  N  j 

N 


deviation  cross  products  for  all 
X  overall 


y   ^1   ^2 


A' 


m-. 


where  t. .  is  an  element  of  T,  y  is 

ij  -"   -^ 


experimental  variable,   x 
variates 


^,... 


,x     are  co- 
rn 
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S^  =  Z  -  A'd""'"A 

=  overall  sura  of  squares 
Overall  standard  deviation  s.  =  /   t 

t. 


11 


(d.f.) 


Correlation  r.  .  =  '^ — 


Overall  mean:  X.  = 

1 


^j  "  i  [I  ^ik  ^jk  -  I  hk  I  ^jk  ]  -  I  ^ijk 

K.  K        K 


5=^i 


W  = 


V 


w 


^2    '     •     ' 

A' 
w 


m 


w 


S^  =  Z  -  A  'D  ^A 
2    w    w  w   V 


=  adjustment  sum  of  squares  (pooled) 

Then 

S3  =  S2  -  SI 
SU  =  S5  -  S2 


where  w^ .    is  an  element  of  WJ 
y  is   experimental  variable 


1' 


•x     are  covariates 

m 


L  =  N  -  (m+l)k 


where     m  =  no.  of  covariates 
k  =  no.  of  groups 
=  Z  [N  -  (m+l)]  c.f.  Winer,  p.  591   N  =  overall  sample 

Nj^  =  sample  size  for  group  k 

L  =  N  -  k  -  m 
L  =  m  (k-1) 
Lu  =  k  -  1 

Homogeneity  of  Regression 
Source  Sum  of  Squares   Degrees  of  Freedom  Mean  Square  F  Ratio 

pooled  within        S3  L3  S3/L3     (S3/L3/si/Li: 

unpooled  within       SI  LI  Sl/Ll 

Analysis  of  Covariance 
Source         Adj .  Sum  of  Squares   Degrees  of  Freedom  Adj  Mean  Sq.   F  Ratio 
SS  treatments         Sk  Lk  SU/lU      (Sl+/Li+ )/(J/! 


residual 


S2 


L2 
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B  =  D   A 
V    w 


=  adjustment  coefficient  for  covariate 


adj  k 


=  h   ^l    f(^-  -  ^,J  Bj   i  =  1,. 


m 


ik' 


,m 


=  adjusted  group  mean  for  experimental  variable 
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I .   Restrictions 

Suboperations  (5)»  (6),  and  (?)»  tests  of  differences  and  analysis  of 
variance  and  covariance,  require  the  data  to  be  divided  into  2  or  more  subgroups 

!/■.   Parameters 


T-TEST  and  ANALYSIS  OF  VARIANCE  and  COVARIANCE 

Parameter 
Number 


Description 

Suboperations  1  -  J,      (See  above.) 

Number  of  subgroups  (if  applicable). 

0  -  Count  blanks  as  zeros 

1  -  Count  blanks  as  missing  data 

0  -  use  N-1  as  degrees  of  freedom 

1  -  use  N  as  degrees  of  freedom  (See  Special 

Comments ) . 

0  -  pooled  standard  error  in  opt .  5 

1  -  non-pooled  standard  error  (variances  assumed 

unequal ) . 


V .      Special  Comments 


All  T-TEST  input  is  provided  through  a  $INPUT  card.   Input  addresses  of 
subgroups  for  options  5,  6,  and  T  are  listed  on  a  $INPUT  card.   (See  section 
on  SOUPAC  Input/Output ) .   See  examples  for  illustration.   If  options  1  or  2 
are  used,  provide  only  1  input  address  on  a  $INPUT  card.   For  option  3,  provide 
two  addresses,  the  first  for  the  sample  being  tested,  the  second  for  the 
criterion  means.   For  option  k   provide  two  input  addresses,  the  first  for  the 
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means,  standard  deviations  and  sample  size  of  the  sample,  the  second  for  the 
criterion  means.   No  second  address  with  options  3  and  h   means  a  zero  criterion 
mean  is  to  be  used. 

If  row  vector  of  means  is  used  this  must  be  of  length  equal  to  number  of 
variables . 


Most  work  requires  N-1  degrees  of  freedom, 
missing  data. 

VI .   Examples 

1 .   A  Complete  Program 

/*ID    [accounting  information] 

//  EXEC   SOUP 

//SYSIN  DD  * 

T-T  (5)(3)(l). 

$INP(C)(C)(C). 

END  S 

DATA  (10)(10F2.0) 

[data  cards  for  group  l] 

END# 

DATA  (10)(10F2.0) 


Option  7  does  not  check  for 


[data  cards  for  group  2] 


END# 

DATA  (10)(10X,10F2.0) 


[data  cards  for  group  3] 


END# 


This  is  a  complete  SOUPAC  program  to  do  T-tests  on  ten  variables  and  three 
groups  taken  pairwise,  i.e.  1  vs  2,  1  vs  3  and  2  vs  3,  thus  30  T's  will  re- 
sult, printed  in  10  tables.   Note  that  data  decks  are  stacked  one  behind 
another,  each  with  its  own  DATA  statement  and  END#  card.   A  format  appro- 
priate to  the  data  is  given.   The  form  of  this  is  optional,  but  all  groups 
should  have  the  same  number  of  variables.   A  check  for  missing  data,  coded 
blank  or  -0.0,  will  be  made. 

If  the  T-T  card  were  replaced  by  T-T(6) (3) (l) . ,  an  analysis  of  variance 
would  be  performed  yielding  an  F-ratio  for  each  variable.  Other  cards  could 
remain  the  same. 

2.   Other  examples  T-T  and  corresponding  $INPUT  cards  follow.   Note  that  in- 
put from  S  (sequential)  units  requires  that  data  be  stored  there  earlier  in 
the  same  run,  or  permanently  stored  in  the  system.   Data  decks  will  always 
be  called  for  in  the  order  listed  (unless  otherwise  specified). 


T-T  (l). 
$INP(S1) 


Paired  T-Test  on  data  stored  on  SI 
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T.T  (?)(  )(1)(1) 
$INP(S15). 


T-T  (3). 
$INP(S1)(S3). 

T-T  (5)(2), 
$INP(S1)(S2). 

T-T  (6)(U)(1). 
$INP(S1)(C)(C)(S3) 


Paired  T-Test  on  data  stored  on  S15  doing  test  on 
all  possible  pairs  checking  for  missing  data  and 
using  N  degrees  of  freedom. 

T-Test  of  population  means  on  SI  against  criterion 
means  on  S3. 

T-Test  of  group  mean  of  two  groups ,  one  on  SI  and 
the  other  on  S2. 

One-way  analysis  of  variance  over  four  groups,  one 
on  SI ,  two  from  cards ,  and  one  from  S3 ,  checking 
for  missing  data. 


.■ai^sciv^-c-: 


CORRELATIONS  AND  REGRESSION  PACKAGE 


>«»:^sw«««e; 


^-^ 


BISERIAL  CORRELATION 


General  Description 

This  program  calculates  the  following  coefficients  for  each  com- 
bination of  one  dichotomous  and  one  continuous  variable. 


Case  totals 


%   cases  in  p  =  N_/N 

^  cases  in  q  =  Nq/N 
Total  cases  =  N 


Mean: 


Xp  =  2Zp/Np 
X  =  2X/N 


Standard  deviation: 


Sp2  =  ZZp2/Np  -  Xp^ 


Biserial  r 


r  = 


:      Z2C^/N     - 

x2 

(^p-^q) 

pq 

S  (.3989) 

h 

where  p  =  percentage  of  cases  in  0  category 
q  =  percentage  of  cases  in  1  category 
h  =  height  of  the  normal  curve  computed  from  normal  tables 

The  program  checks  for  missing  data,  and  computes  the  above  measures 
only  for  those  cases  where  both  dichotomous  and  continuous  variables  are 
present. 


Type  of  Input 

Data  is  read  row-wise  from  either  tape  or  cards.   All  dichotomous 
variables  must  be  first  in  each  row.   They  should  be  coded  with 
0  and  1 


Parameters 

The  program  call  card  requires  h   parameters  after  the  program  name, 
BISERAL  R: 
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Parameter 
Number 

1 

2 

3 
h 

5 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15 . 

Output  Address  of  correlations.   SEQUENTIAL  1-1^ 
and/or  PRINT. 

Number  of  dichotomous  variables. 

Number  of  continuous  variables. 

0  -  treat  blanks  as  missing  data 

1  -  count  blanks  as  zeros 


CANONICAL  ANALYSIS 


I.  General  Description 

The  CANONICAL  CORRELATION  program  provides  a  miiltivariate  test  of  the 
hypothesis  that  two  sets  of  normally  distributed  variables  are  independent. 
The  larger  set  of  variables  (the  predictor  variables)  is  considered  to  have 
q  members,  and  the  smaller  set  (criterion  variables)  has  p  memebers.   This 
program  also  linearly  transforms  each  set  of  variables  into  a  new  set  of  in- 
dependent variables,  (or  dimensions)  such  that  the  first  nev  predictor  (a 
linear  combination  of  the  original  predictors)  has  maximum  correlation  with 
the  first  new  criterion  variable.  The  second  new  predictor  is  maximally 
correlated  with  the  second  new  criterion,  and  so  on  (with  the  constraint 
that  each  new  variable  is  uncorr elated  with  the  previous  new  variables  de- 
rived from  the  same  set  of  original  variables. 


Let  the  criteria  set  consist  of  the  £_ variables  x  , . . . ,x  and  the  pre- 

X  ,  .  Assxame  p  -  q.   We 
p+q 


dictor  of  ^  other  variates  x 
weighting  matrices  U  ^  and 

Clp 


p+1' 


then  look  for 


W 


ab 


^a  = 


E 

8  U 


a3   6 


n  =  b  ¥ 


:b\ 


Let 

1  that 

a 

=  1,    2, 

•P 

3 

=  1,   2, 

•P 

a 

=  P+1, 

p+2 

5        • 

b 

=  1,    2, 

•      P 

P+q 


The  variables  €,   and  t"i  have  the  following  properties: 

1.  They  are  standardized  variables. 

2.  Within  each  set,  the  C's  are  independent  and  the  ri's  are  independent,  i.e. 
within  the  set  Ccx  (as  a  runs  from  1  through  p)  and  within  the  set  of  ri 
(as  a  runs  from  1  through  q)  the  correlations  are  zero. 

3.  The  correlation  between  any  ^  and  any  T]   is  zero  except  for  p  correlations 

A  , . .  .  ,  A  . 

1       p 

The  purpose  of  this  program  is  to  find  the  p  correlations  A  ,...,X  and  the 

weighting  matrices  U  ^  and  W  ,  .  -^ 

^  a3      ab 

Formulas  and  Calculations 

The  correlation  matrix  R  is  first  partitioned  into: 


A 


B 


where  A  =  correlation  among  predictors 
B  =  correlation  among  criteria 
C  =  correlation  between  predictors  and  criteria 
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Standardized  regression  coefficients:   3  =  A  C 

2     -1 

Multiple  R-squared:   R  =  CA  C  =  C3 


The  following  equation  is  solved  for  A  and  U 


1)   (CA  ^C  -A^B)U  =  0 


where  A,  B,  and  C  are  as  above,  X   = 
canonical  r^,  and  U  =  criteria  weight- 
ine  matrix 


The  solution  is  obtained  by  using  the  following  derived  forms: 


2)   (B  -  yl)  h  =  0 


where  Y  represents  eigenvalues  of  B, 
I  is  identity,  and  h  represents 
eigenvectors  of  B 


3)   [(HD  ■^^^)'(CA  ^C')(HD  ^^^)  -  A^l]v  =  0  where  D  is  the  diagonal  matrix  of 

A,  H  is  matrix  of  h  vectors  A^  repre- 
sents eigenvalues  of  E  and  v  repre- 
sents eigenvectors  of  E,  E  =  (HD"-'-/^)' 
(CA-lC')(HD-l/2) 


-1/2 

U  =  HD  ^  V 


where  V  is  matrix  of  v  vectors 


B  =  HDH' 


needed  only  to  prove  equivalency  of 
1)  with  2)-k). 


-1/2 
The  Predictor  weighting  matrix:   W  =  glTF      where  F  is  diagonal  matrix 


Wilks'  Lambda:   A.  =  .TT.  (l  -  A.) 


of  A^,  i.e.  elements  F'-^'^   are  1/A 
on  the  diagonal  and  off. 

for  the  j   function 


Chi-Square:      )^, 


=    10£ 


(A.)( 

J 


2 


-  n)      where  n   is   sample   size 


III 


Input 


Input  to  the  CANONICAL  ANALYSIS  program  consists  of  a  correlation  matrix. 
These  variables  include  a  set  of  predictor  variables  and  a  set  of  criterion 
variables.   Either  set  may  be  first  on  the  input  data  but  there  can  be  no  mix- 
ing of  the  two  types  of  variables  on  the  input  data.   The  TRANSFORMATION  program 
may  be  used  to  reorder  the  variables  if  they  are  mixed  on  the  card  data  deck. 

IV.   Significance  Tests 


Included  in  the  printed  output  of  the  CANONICAL  program  is  a  Chi-square 
value  for  each  of  the  eigenvalues  A  computed  in  the  program.   The  chi-square 
values  printed  are  determined  from  the  Wilks'  lambda  values  using  the  procedure 
outlined  by  Bartlett  (See  Section  X).  The  chi-square  values  provide  a  test  of 
the  null  hypothesis  that  the  £  variates  are  unrelated  to  the  q  variates.   If 
there  is  at  least  one  way  in  which  a  linear  combination  of  the  criterion  variables 
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is  correlated  with  a  linear  combination  of  the  criterion  variables  this  Chi- 
square  value  will  be  significant.   The  second  Chi-square  may  then  be  examined. 
This  Chi-square  is  a  test  of  a  second  relationship  after  the  first  relation- 
ship has  been  removed.   If  this  Chi-square  is  significant  a  second  linear 
combination  of  the  predictor  variables  is  correlated  with  a  second  linear 
combination  of  the  criterion  variables.   This  process  continues  until  the 
first  non-significant  Chi-square  is  found.   All  Chi-squares  beyond  that  point 
will  be  non-significant. 

V.   Output 

The  output  consists  of  the  following: 

1.  The  matrix  of  standardized  regression  coefficients.  This  is  the  matrix  of 
coefficients  which  would  be  formed  if  the  raw  data  used  to  calculate  the 
correlation  matrix  input  had  been  converted  to  standard  scores.   The  pre- 
dictor variables  are  on  the  rows  of  the  matrix  and  the  criterion  variables 
are  on  the  columns  of  the  matrix. 

2.  A  multiple  correlation  squared  (R  )  for  each  of  the  criterion  variables. 
The  first  r2  value  is  the  multiple  correlation  of  the  first  criterion 
variable  with  the  entire  set  of  predictors  variables.   The  second  is  for 
the  second  criterion  variable  with  the  set  of  predictors,  etc. 

2 

3.  A  set  of  eigenvalues  X  ,  correlations  X,  Wilks '  lambdas,  Chi-squares,  and 

degrees  of  freedom.   (See  Section  IV)  (Printed) 
h.     A  matrix  of  criterion  weights  (Parameter  Number  3) 
5.   A  matrix  of  predictor  weights  (Parameter  Number  h) 
VI.      Restrictions 

A.  Precalculated  correlation  matrices  should  be  punched  with  sufficient 
accuracy;  accumulated  round-off  can  cause  malfunctions  and  errors  in  results. 
Raw  data  and  the  CORRELATION  program  should  be  used  whenever  possible,  or 
maximum  accuracy  preserved  in  punching. 

B.  The  matrices  A  and  B  must  both  be  non-singular. 

C.  Number  of  criteria  (p)  must  be  less  than  or  equal  to  the  number  of 
predictors  (q). 

II.   Parameters 


The  parameters  for  the  CANONICAL  ANALYSIS  program  follow  the  program 
mnemonic  CAN  on  the  main  program  card.   Each  parameter  must  be  enclosed  in 
parentheses.   The  parameters  must  appear  in  the  order  given  below.   If  a  param- 
eter is  not  needed,  do  not  punch  anything  between  its  parentheses.   All  paren- 
theses after  the  last  non-empty  pair  may  be  omitted. 
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Parameter 
Niimber 


8 
9 

10 


Description 

Input  Address  (correlation  matrix).   CARDS  or 

SEQUENTIAL  1-5- 

Sample  size  of  rav  data  needed  for  Chi-square. 
May  be  zero  or  blank  if  unknown. 

^     Output  Address  of  criterion  weighting  matrix. 
SEQUENTIAL  1-5  and/or  PRINT. 

Q,     Output  Address  of  predictor  weighting  matrix. 
SEQUENTIAL  1-5  and/or  PRINT. 

Number  of  predictor  variables. 

Number  of  criterion  variables.  (Must  be  less 
than  or  equal  to  number  of  predictors). 

Order  of  variable  sets  on  input : 

1  if  predictors  are  first 

2  if  criteria  are  first 

1  if  want  regression  coefficients  printed 

2 

1  if  want  multiple  correlation  squared  (R  ) 

printed 

Output  Address  of  eigenvalues.   Print  is  not 
valid. 


fi  It  is  possible  to  print  in  F  format  and/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in  the 

Introduction  on  Input  and  Output. 

VIII.   Special  Comments 

This  program  does  not  check  for  missing  data.   All  blank  spaces  are  read 
as  zeros. 


IX.   Examples 


B 


/*ID    [accounting  information] 
//  EXEC   SOUP 
//SYSIN  DD   * 
CANONICAL  (CARDS) (50) 

(PRINT) (PRINT) (15) ( 5) (1)(1)(1). 
ENDS 

DATA(20)(8F10.T) 


punched  correlations 


END# 
/* 


/*ID    [accounting  information] 

//  EXEC   SOUP 

//SYSIN  DD  * 

C0R(C)(P)(S1/P) . 

CAN(S1)(50)(P)(P)(15)(5)(1)(1)(1) 

ENDS 

DATA(20)(20FU.0) 


raw  data 


END# 
/* 
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Examples  A  and  B  illustrate  the  use  of  the  program  card  for  Canonical 
correlations.   In  these  examples  the  input  data  consists  of  15  predictor  and 
5  criterion  variables  and  will  be  a  card  deck. 

In  exajnple  A  the  card  deck  is  a  punched  correlation  matrix;  note  that  a 
possible  format  is  8  values  to  a  card  with  10  digits  each.   The  format  will 
be  reused  until  20  values  are  found  per  row.   Thus  the  deck  will  be  contained 
on  60  cards.   The  Canonical  correlations  program  reads  these  cards. 

In  Example  B  the  card  deck  is  raw  data,  the  format  could  be  any  string 

adequate  to  read  the  data.   The  data  deck  is  read  by  the  correlation  program, 

which  produces  correlations  and  stores  them  on  sequential  1  which  Canonical 
reads . 

The  same  calculations  will  be  performed  by  each  Canonical  program  (ex- 
amples A  and  B).   Fifteen  variables  are  predictors  and  these  are  the  first  15 
in  each  row.   The  sample  size  of  50  was  provided  in  order  to  get  all  signifi- 
cance tests.   The  printed  output  will  be  the  canonical  correlations,  the 
criterion  weighting  matrix,  the  predictor  weighting  matrix,  and  significance 
tests. 

References 
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CORRELATION 


I.   General  Description 


The  main  purpose  of  the  CORRELATION  program  is  the  calc\ilation  of 
Pearson  product-moment  correlations  (hereafter  referred  to  as  correlations 
in  this  writeup) .   A  correlation  measures  the  linear  dependency  between 
two  variables,  and  this  program  calculates  a  correlation  for  each  pair 
of  input  variables.   The  square  of  a  correlation,  sometimes  called  the 
coefficient  of  determination,  represents  the  proportional  reduction  in 
variance  of  one  variable  due  to  a  linear  relationship  with  another.  Thus 
the  coefficient  of  determination  measures  the  strength  of  a  linear  relation- 
ship, or  the  proportion  of  variance  accounted  for  by  a  linear  rule. 

The  CORRELATION  program  automatically  produces  other  types  of 
correlation  coefficients,  because  the  calcTolations  reqiiired  are  identical. 
Thus  point  biserial  coefficients  of  correlation  (often  preferred  to 
biserial  correlation),  phi  coefficients  (alternative  to  tetrachoric 
coefficients),  and  Spearman's  rank  order  correlations  can  be  readily 
obtained.  Thus,  if  the  input  consists  of  dichotomous  variables,  the 
output  will  contain  a  mixture  of  phi's,  point  biserials,  and  ordinary 
correlations.   (A  point  biserial  correlation  is  a  correlation  between 
a  dichotomous  variable  and  a  continuous  variable).   If  the  input  to 
the  correlation  program  consists  of  rank  ordered  data  (ordinal),  the 
output  will  be  Spearman's  rank  order  correlations.   (See  Walker  and  Lev, 
Chapter  11  for  comparisons  and  comments  on  the  above  mentioned  coefficients). 

In  the  process  of  calculating  the  correlations,  the  means  and  standard 
deviations  of  the  individual  variables  are  computed,  as  are  the  cross- 
products  and  covariances  between  variables.  After  the  correlations  have 
been  calculated,  they  are  used  to  calculate  the  linear  regression  coefficients 
and  corresponding  intercept  terms  needed  for  predicting  each  variable  from 
each  other  variable. 

II.  Input 

Input  to  the  CORRELATION  program  consists  of  a  set  of  independent 
observations  on  two  or  more  variables.   The  data  is  considered  as  a 
two-dimensional  array  (or  matrix)  of  numbers  with  each  column  containing 
the  observations  on  one  variable,  and  each  row  consisting  of  one  obser- 
vation on  each  variable.   If  we  use  the  letter  X  to  represent  the  matrix 
of  raw  data,  we  let  Xj_i  represent  the  i"*^^  row  (where  i  =  1,  2,  ...  N)  and 
the  ^'^^   column  (where  j  =  1,  2,  . . .  M)  .   In  other  words,  we  have  N  obser- 
vations (rows)  and  M  variables  (columns)  in  our  data  matrix  X. 

III.  Formulas  and  Calculations 


The  following  formulas  define  certain  statistics  and  illustrate  their 
methods  of  calculating  within  the  program.   The  subscript  i  refers  to 
observations  (or  individuals)  and  runs  from  1  to  N.   The  subscripts  j  and 
k  refer  to  variables,  and  they  run  from  1  to  M. 
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N 

10 


Mean 


=  X.  = 


i=l 


(of  variable  j) 


N 


Covariance* 
(between  variables 
j  and  k) 


=  C 


jk 


N 


Z  (X.  .  -X.)(X.-  -X.  ) 
i=l  ^J   J    ik   k' 

N^l 


N  N      N 

N  ZX.  .X.,  -  (  ZX.  .)(  ZX. 
10  Ik   \._T  10  ' 


i-1 


i=l  ^   i=l 

N(N  -1) 


ik 


Standard  Deviation*  =:  S .  = 
(of  variable  j)       "^ 


N       -  2 
Z(X..-X.)^ 

i=l  "J    J 

N-  1 


N       N 
N  ZXf.  -  (  ZX.  .) 
i=l^J    i=l"^ 


N(N  -1) 


OJ 


Correlation 
(between  variables 
J  and  k) 


=  R 


jk  -  S.S^ 


N 


N 


N 


N  ZX.  .X.,  -  (  ZX.  .)(  ZX.,  ) 
.  T  ij  ik  \  T  ij '  \  ^  ik' 
1=1  ^  1=1  ^   1=1 


N 


N  2  N  /    N~     N     ; 

ZX.  .  -  (  ZX.  .)  /  N  ZXf,  -  (  ZX.,  )' 

.  ,  ij  •  n  ij  v/    .-,1k   \  T  ik 

1=1  '^  1=1  ^  V    1=1      1=1 


From  the  equation  X. .  =  B.  X.   +  A.  ,  the  program  calculates 

ij    Jk  iK    jk 


Linear  Regression  Coefficient  =  B.,  =  R.,  (^) 

(for  predicting  variable  j  from  variable  k)  k 


Intercept 

(constant  term  in  equation  for 
predicting  variable  j  from  variable  k) 


=  A.,  =  X.  -  B.,X- 
jk    J    jklc 


*NOTE:   the  sample  covariances  and  sample  standard  deviations  are  unbiased 
estimates  of  the  corresponding  population  parameters.   The  definitions 
given  here  follow  the  practice  of  many  current  statisticians.   [See  Anderson 
(1958)  -  Chapter  3  for  example.] 

IV.   Significance  Tests 

If  we  assume  that  two  variables  (indexed  by  j  and  k)  have  a  bivariate 
normal  distribution,  there  is  a  test  statistic  for  testing  the  hypothesis 
that  the  correlation  in  the  population  is  zero  (or  equivalently  that  either 
regression  coefficient  is  zero).   Even  for  a  relatively  small  sample  size 
(N),  this  hypothesis  can  be  tested  using  the  t  ratio: 


t  = 


R., -Jn-  2 
Jk 

4 


1  -R 


jk 
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with  N  -  2  degrees  of  freedom.   Other  types  of  hypotheses  can  be  tested 
through  use  of  the  Fisher  R  to  Z  transformation.   [See  Hays  (1963)^  pages 
529  -  533  for  example. ] 

V.   Output 

Output  from  the  CORRELATION  program  may  consist  of  any  or  all  of 
the  statistics  from  section  III  above,  by  using  parameters  2  through  7- 
Any  output  from  this  program  may  be  printed  and/or  output  to  temporary 
storage  (SEQUENTIAL  1-5).   The  means,  standard  deviations,  and  the  sample 
size  (n)  are  output  as  a  matrix  with  M  rows  (one  for  each  variable)  and 
three  columns  (the  third  column  will  have  a  constant  value  of  N  for  all 
variables).   Correlations,  covariances,  and  cross-products  are  printed 
as  lower  triangular  matrices,  while  the  regression  coefficients  and 
intercepts  are  printed  as  square  matrices.  However,  all  five  of  these 
matrices  are  stored  as  square  matrices. 

VI.  Restrictions 

The  CORRELATION  program  will  accept  an  unlimited  number  of 
observations,  but  the  number  of  variables  is  limited  as  noted  in  the 
section  on  PROGRAM  LIMTS  in  the  INTRODUCTION. 

VII.  Parameters 


The  parameters  for  the  CORRELATION  program  follow  the  program 
name  on  the  main  program  card.  Each  parameter  must  be  enclosed  in 
parentheses.  The  parameters  must  appear  in  the  order  given  below. 
If  a  parameter  is  not  needed,  do  not  punch  anything  between  its 
parentheses.  All  parentheses  after  the  last  non-empty  pair  may  be 
omitted. 


Parameter 
Number 


Use  or  Meaning 

Input  Address  of  raw  data  (X  matrix). 
CARDS  or  SEQUENTIAL  I-15., 

Output  Address  for  means,  standard 
deviations,  and  sample  size. 
SEQUENTIAL  1-15  and/or  PRINT. 

Q   Output  Address  for  correlation  matrix 
(r).   SEQUENTIAL  1-15  and/or  PRINT. 


Q   Output  Address  for  cross-products 
matrix.   SEQUENTIAL  1-15  and/or  PRINT. 

Q   Output  Address  for  covariance  matrix  (c) . 
SEQUENTIAL  1-15  and/or  PRINT. 

fi  Output  Address  for  matrix  of  regression 
coefficients  (B) .   SEQUENTIAL  1-15  and/or 
PRINT. 
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7  n  Output  Address  for  intercepts  (matrix  A). 

SEQUENTIAL  1-5  and/or  PRINT. 

8  1  if  last  variable  in  each  row  is  a 

weighting  factor. 

n  -  It  is  possible  to  print  in  F  format  axid/or  punch  the  output  from  these 
parameters.   If  you  need  either  of  these  options,  see  the  section  in 
the  INTRODUCTION  on  INPUT  and  OUTPUT. 

VIII.   Special  Comments 

1.  This  program  does  not  check  for  missing  data.   All  blank  spaces 
are  read  as  zeroes.   If  you  have  missing  data,  use  the  ^GSSING 
DATA  CORRELATION  program. 

2.  In  the  output  matrices  of  regression  coefficients  and  intercepts, 
the  row  number  refers  to  the  dependent  variables,  and  the  column 
numbers  refer  to  the  independent  variables. 

3.  If  a  variable  is  constant,  an  error  message  will  be  printed  and 
all  correlations  with  that  variable  will  be  set  to  zero. 

h.  In  order  to  have  the  program  perform  its  calculations  separately 
for  sub samples  of  the  data,  see  the  section  on  CONTROL  VARIABLES 
in  the  INTRODUCTION. 


IX.  Examples 

lA 

/^ITi   <accounting  information> 

//  EXEC  SOUP 

//SOUP.SYSIN  DD  * 

CORRELATIONS  (CARDS) (  ) (PRINT) . 

END  SOUPAC 

DATA  (6)(6f2.0) 


IB 

/*ID  <accounting  information> 

//  EXEC  SOUP 

//SYS IN  DD  * 

COR  (C)(  )(P). 

ENDS 

DATA  (6)(6F2.0) 


END# 


h 


Example  lA  illustrates  the  usage  of  the  CORRELATION  program.  Notice 
that  all  words  are  spelled  out  although  this  is  lonnecessary.  Notice  also 
that  correlations  are  to  be  printed  out,  although  the  means  and  standard 
deviations  are  not.   Example  IB  will  perform  exactly  the  same  computations 
as  lA,  except  that  all  instructions  have  been  abbreviated  to  make  keypunch]? 
easier. 
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/•><-ID  <accounting  information> 

//  EXEC  SOUP 

//SYS IN  DD  * 

COR  (C^(P)(P/S1). 

PRINCIPAL  AXIS  FROM  (Sl)  TO  (S2/P)  WITH  (lO)  FACTORS  AND 

(100)  PERCENT  OF  THE  VARIANCE  TO  BE  REMOVED. 

VARIMAK  ROTATION  FROM  (S2)  TO  (PRINT). 

ENDS 

DATA  (20)(lOFi+.0,5F6.2/lOX,5FU.l) 


END# 

In  the  second  example,  the  CORRELATION  program  first  prints  the  means 
and  standard  deviations.   Then  it  prints  the  CORRELATION  matrix  and  stores 
it  on  SEQUENTIAL  1  (Sl).   The  PRINCIPAL  AXIS  program  then  performs  a 
principal  components  analysis  and  outputs  10  components  to  S2.  VARIMAX 
then  rotates  these  10  components,  using  the  VARIMAX  criterion,  and  prints 
the  results. 

X.   References 

T.  W.  Anderson,  An  Introduction  to  Multivariate  Statistical  Analysis; 
John  Wiley  and  Sons,  Inc. ,  1958. 

E.  C.  Bryant,  Statistical  Analysis;  McGraw-Hill,  I96O,  pp.  113-135 . 


W.  L.  Hays,  Statistics  for  Psychologists;  Holt,  Rinehart  and  Winston, 
i960. 

H.  M.  Walker  and  J.  Lev,  Statistical  Inference;  Henry  Holt  and  Company 
New  York,  I96O. 


II. 


MISSING  DATA  CORRELATION 


General  Description 

The  MISSING  DATA  CORRELATION  program  calculates  the  following  coeffi- 
cients for  every  combination  of  variables: 


TNa 


Mean 


^i  = 


Standard  Deviation 


xij 


-]■ 


Covariance 


M,.L(XY„)  -  (lX,.)(rYij) 


IJ 


"id  ("ir^: 


Correlation :   r . 


^id 


ij  -  Sx..Sy,. 


Restrictions 


The  maximum  number  of  variables  for  this  program  is  100. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.  Output  may  be  printed  and  the  correlation  matrix  may  be  placed  on 
any  source  conforming  to  SOUPAC. 


III.   Parameters 


The  parameters  for  the  MISSING  DATA  CORRELATION  program  appear  on  the 
program  card.   They  must  follow  the  program  name  in  the  following  order: 


Parameter 
Number 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
Default  is  CARDS. 

0  -  printing  as  usual 

1  -  printing  is  suppressed 

Output  Address  of  correlation  matrix. 

Output  Address  for  sample  sizes. 

Coding  for  missing  data;  if  left  blank  or  if 
zero  is  entered,  minus  zero  is  used  as  check. 
It  is  NOT  possible  for  this  program  to  count 
true  zeroes  as  missing  data.   This  parameter 
must  be  enclosed  in  asterisks.  Example:   *99*« 


NOTE:   All  output  is  in  double  precision. 


seaotAiKi 
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IV.   Special  Comments 

A.   The  user  is  warned  against  further  processing  of  the  correlations 
output  by  this  program  because  the  correlations  do  not  necessarily 
come  from  the  same  sample . 


For  control  breaks,  data  must  be  presorted  on  the  control  variables 
with  the  last  variable  changing  fastest.   The  maximum  number  of  contra 


variables  is  30   Control  variables  begin  on  a  new  card  with  $C-B  in  el 

1  and  are  enclosed  in  parentheses. 


C  .  The  correlation  matrices  can  be  stored  in  parameter  3  is  a  temporary 
storage  address.   However,  if  control  breaks  are  also  being  used, 
only  the  first  matrix  corresponding  to  the  first  control  break  can  be 
saved . 


MULTIPLE  CORRELATION 


I .  General  Description 

The  MULTIPLE  CORRELATION  program  calculates  the  following  coefficients 
where  n  =  sample  size  n 

m  =  sum  of  weights    [  m  =   I  w.   ] 

i=l   ^ 
p  =  number  of  independent  variables  (Parameter  2) 
S  =  vector  of  standard  deviations 
X  =  independent  variable  means 
Y  =  dependent  variables  means 


n 

I      X. 


Mean:   X.  = 
J 


i=l 


ij 


Raw  Data  Cross-Products :      X'X  =        Z      (x. .x      ) 

1 J  IK 


Covariance:   c 


X'X 


jk   (n-1) 


i=l 

(2x.)   (Zx.) 
J      ^ 

n(n-l) 


Standard  Deviation:   s.  =  /  c  .  . 
3  3  3 


'jk 


Product  Moment  Correlation: 


jk 


s.s 
J  k 


The  correlation  matrix  is  then  partitioned  as  follows 


B 


B' 


where  A  is  the  independent  variables  correlation  matrix.   C  is  the  dependent 
variables  correlation  matrix.   And  B  and  B'  are  the  cross-correlation  matrix, 


Standardized  Regression  Coefficients}   3  =  A  B 


n-1 


Deviation  Covariance  Matrix:   D  =  S ( C -B ' 3 ) S ' 


[n-p-1] 


.»«:^6Ssiso«- 
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This  leaves  the  following  matrix: 


-1 


Dpvlfl.tinn     (Partial)    florrpl  a.ti  on    Matriyr       D         =   D.,  /    (D      D       ) 


1/2 


1 

Regression  Covariance  Matrix:   RC  =  SB'SS'  [ ] 

P 

1/2 
Multiple  Correlation:   R.  =  (B'6). 
J        J 


R. 


F  Ratio:   F.   -^  ,  =  ,  ^  2 
J  n-p-1    1-R . 

J 


n-p-1 

[ ] 


Q     n  1      "1  /  ^ 

Standard  Error  of  Estimate:   S    =  s.[(l-R.  )  — ^^  ] 

e      J     J   n-p-1 

J 


Unstandardized  Regression  Coefficient:   b .  ,  =   — ^  6 

,1  .  K        S,_ 


J.K      s^        j.k 


Dependent  Variable  Intercept:   b  v~-"-v"   ^   ^-v^- 

O.K      K     ._,     J«-k   J 


Standard  Error  of  Unstandardized  Regression  Coefficient:   s 


^.  ^  =  S    [(X'X).  "^] 
bj  .k    e^^      jj 


Standard  Error  of  Standardized  Regression  Coefficients: 
\ 

J  .k     J     J  .k 

^J.k 

T  =  Regression  Coefficient/Standard  Error:   T ,    =  — '^-^ — 

J  -k   Sq 


Predicted  Dependent  Variables:   y  *  =b  ,+  Z   b.x. 

k     o .k    ^^   J .k  J 
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Deviations  from  Observed: 


^j-^j 


n 

I      (z. 

,.     ^.      T,  ^  ^     ^^.    .      ^    1      .  i=2       '^i-1 

Durbm-Watson  Coefficient:        d.    =  

' ' j       n 

2   iz.  y 

i=l   "^i 


For  reference  to  formulas  and  interpretations  see: 

E.  C.  Bryant,  Statistical  Analysis,  New  York,  McGraw-Hill,  I96O,  pp.  198-22^1. 

II .  Restrictions 

^50  variables,  U30  Dependent  and  Independent  variables.   (See  Parameter  2) 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC ,   Ouput  will  be  printed  and/or  stored  as  indicated. 

Ill .   Parameters 

The  parameters  for  the  MULTIPLE  CORRELATION  program  appear  on  the 
program  call  card.   (Most  problems  require  only  parameters  1,  2,  5j  and  Tj 
see  example  1.)  They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15, 
(See  Special  Comments  for  order  of  variables.) 

Number  of  independent  variables. 

Ind.  var .  +  dep.  var .  +  wt .  var .  +  control  var .  <  U50, 

Ind.  var.  +  dep.  var.  <  ^30. 

Output  address  of  predicted  dependent  variables. 
SEQUENTIAL  1-15  and/or  PRINT. 

Output  address  of  deviations  from  actual. 
SEQUENTIAL  1-15  and/or  PRINT. 

Output  address  of  Means  and  Standard  Deviations. 

SEQUENTIAL  1-15  and/or  PRINT. 

1st  column  contains  Means. 

2nd  column  contains  Standard  Deviations. 

3rd  column  contains  Sum  of  Weights. 

i|th  column  contains  Sample  Size. 
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Parameter 
Number 


Use  or  Meaning 

Output  address  of  coefficients  (unstandardized) . 
SEQUENTIAL  1-15,  PRINT  is  default.   Coefficients 
for  M  independent  and  N  dependent  variables  are 
written  as  N  rows  with  N  +  M  +  1  columns  each. 
The  i^^  row  contains  in  order  the  i"^^  intercept 
term,  the  M  coefficients  for  the  i^^  dependent 
variable,  a  -1  in  the  M  +  i  +  1  location,  and 
O's  for  all  other  locations.   This  format  is 
compatible  with  the  ECONOMETRICS  REDUCED  FROM 
AND  RESIDUAL  ANALYSIS  program. 


Q,   Output  address  of  correlation  matrix.   SEQUENTIAL 
1-15  and/or  PRINT. 

^   Output  address  of  raw  data  cross-products  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 

Q   Output  address  for  covariance  matrix. 

SEQUENTIAL  1-15  and/or  PRINT. 


10 


11 


fi  Output  address  of  deviation  covariance  matrix. 

SEQUENTIAL  1-15  and/or  PRINT. 

^   Output  address  of  deviation  (partial)  correlation 
matrix.   SEQUENTIAL  1-15  and/or  PRINT. 


12 


^   Output  address  for  regression  covariance  matrix, 
SEQUENTIAL  1-15  and/or  PRINT. 


13 


14 


Output  address  of  Durbin-Watson  and  second,  third 

and  fourth  powers  of  sums  of  deviations. 

SEQUENTIAL  1-15  and/or  PRINT. 

Row  1  Durbin-Watson  Coefficients. 

Row  2   Z(y-y*)2 

Row  3   Z(y-y*)3 

Row  h     l{y-y*r 


fi  Output  address  of  inverse  of  au^ 
variables  cross-products  matrix, 
and/or  PRINT. 


lented  independent 
^      SEQUENTIAL  1-15 


15 


If  weighting  factors  are  desired,  code  this  paramter 
1  or  -1  (see  footnote  3).  The  weights  must  be  in  each 
row  and  must  be  to  the  right  of  the  dependent  variable 
(see  Special  Comments  below).  Weights  should  indicate 
a  replication  of  an  observation.  Leave  this  parameter 
blank  or  code  a  zero  if  weights  are  not  wanted. 


Parameter 
Numter 
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Use  or  Meaning 


16 


Tolerance  used,  to  determine  if  correlation  matrix 
is  singular.   If  this  parameter  is  left  blank,  a 
tolerance  of  10  ~5  will  be  used.   If  any  other 
tolerance  is  desired,  it  should  be  punched  as  follows: 
*_.£-_*  where  the  blanks  could  be  filled,  in  as 
follows:   *13.5E-10*.   This  parameter  must  be  enclosed 
in  asterisks  as  shown  in  the  examples. 


IV.   Special  Comments 

Sample  Size,  Regression  Coefficients,  Standard.  Errors^  F  Ratio,  Multiple 
Correlations,  T  Ratio,  and  Dependent  Variables  Intercept  are  printed  by  default. 

Thirty  control  variables  will  be  allowed  and  are  specified  by  normal  con- 
ventions but  these  variables  must  be  to  the  right  of  the  dependent  variables 
and  weights.   Control  variables  will  not  be  in  the  calculations.   If  control 
breaks  are  used  only  the  first  set  of  output  can  be  stored  on  Sequential  address. 
(Control  variables  must  be  pre-sorted  either  in  SOUPSORT  or  by  Machine). 

Independent  variables  must  be  on  the  left,  then  dependent  variables,  weights, 
(if  any),  and  control  variables  (if  any). 

If  the  independent  variable  or  the  only  dependent  variable  is  constant  a 
message  will  be  printed  and  the  sample  will  be  discarded  after  computing  the 
correlation  matrix. 

The  index  "O"  on  printed  matrices  refers  to  the  intercept  term. 


Examples 

(1) 

/*ID 

//   EXEC 

SOUP 

//SYSIN 

DD   * 

MUL(C)(5)()()(P 

)()(P) 

ENDS 

DATA(T) 

(7F1.0) 

END# 
/* 

This  program  reads  from  cards,  uses  5  independent  variables  and  2  dependent 
variables.   Means,  Standard  Deviations,  Correlations,  and  default  options  are 
printed. 
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(2) 


/*ID 

//  EXEC  SOUP 

//SYSIN  DD   * 

TRA(C). 

PER(l) (1,15) (20, 22) (16, IT) (19) (18) (23). 

0UT(S1)(1,23) . 

PER(l) (1,15) (20, 22) (16, IT) (19) (18) (23). 

ENDP 

MUL(Sl)(l8)()(P)()()()(P)()()()()(P)(P)(l)*1.0E-8*. 

$C-B  (22) (23). 

ENDS 

DATA  (23)(23F2.0) 


END# 
/* 

This  program  reads  from  Sequential  1,  uses  l8  independent  varieties, 
2  dependent  variables,  weights,  and  control  variables.  Ouput  are  deviations, 
cross  products,  Durbin-Watson  coefficients,  sums  of  the  second,  third,  and 
fourth  powers  of  deviations,  inverse  of  augmented  independent  variable 
cross  products  matrix  and  default  options  are  printed. 

VI.   Footnotes 


Durbin-Watson  Coefficients  are  a  measure  of  autocorrelation  with  a 
distribution  between  0  and  h. 

2 
If  weights  are  not  used  column  3  (sum  of  weights)  is  equal  to  the 

sample  size  (n)  . 

If  weights  (Parameter  15)  are  used  two  options  are  available.   (l)  Code 
Parameter  15  a  -1  if  the  sum  of  weights  (m)  should  be  substituted  for  sample 
size  in  all  calculations  (warning — this  may  show  a  higher  significance  than 
is  warranted  from  the  data).   (2)  Code  Parameter  15  a  1  if  the  sum  of  weights 
should  be  substituted  only  as  follows: 


Mean :   X .  = 
J 


n 

Z 

1=1 


w.  X  .  . 

1   ij 


m 


Covariance:  c 


h 


jk 


X'X 


(n-1 


( Zwx  . )  (  Zwx  ) 


m 


(n-1) 


The  dependent  variable  portion  of  the  raw  data  cross-products  matrix  is 
deleted  leaving  the  independent  variable  portion  ZX.   ZX .   and  n  is  then 
augmented. 
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n 

'-\ 

^^1 

X'X 

w 


MD       A 


-1 


such  that   for   j=l,   p  and   k=l ,    p 


-1 


JiL 


jk        ns.s, 
J   k 


(Sx     Z) 

J 


1   -    {lx^)Y 


W  - 


(from  inversion  by  partitioning) 

VII.   Coefficients  of  Linear  Dependency: 

If  the  correlation  matrix  is  singular  a  message  will  be  printed  as  follows 

INPUT  MATRIX  IS  SINGULAR. 

THE  FOLLOWING  ROWS  ARE  LINEARLY  DEPENDENT: 

12 N-1   N 

Following  this  message  the  coefficients  will  be  printed  as  follows: 
COEFFICIENTS  OF  LINEAR  DEPENDENCY 

N 

1   xxxx.xxxxx 

2    XXXX . XXXXX 


N-1  XXXX.XXXXX 

The  values  are  the  unstandardized  regression  coefficients  of  variable  N 
predicted  by  variables  1  through  N-1.   Those  values  which  are  approximately 
equal  to  zero  are  not  part  of  the  dependency. 


PAETIAL   CORRELATION 


uoneral  Description 

This  routine,  upon  option,  provides  two  of  the  more  common  types  of 
special  purpose  correlation  coefficients. 


A, 


Partial  Correlations: 


This  program  produces  coefficients  of  net  correlation  of  any  order 
frotii  1  to  19  in  matrix  form.   Coefficients  of  successively  higher 
order  may  be  obtained  by  repeated  calls  to  the  program,  each  time 
using  as  input  the  previously  generated  partial  correlation  matrix; 
or  several  variables  may  be  held  constant  at  the  same  time  by  one 
call  to  the  program. 


The  general  eauation  used  is: 
^  _   ij .abc. . . (n-1 


-   r . 


in.abd. . . (n-1)    *      ij .abc. . . (n-1 


iJ .abc. . .n 


^"  •         K^         i      ^\  )^/"'    (1    -    r . 

m.  abd.  .  .  (n-1)  jn.abc 


i;^ 


References : 


B. 


Mills,  F.C.   Statistical  Methods,  Holt,  Rinehart  and  Winston, 
New  York,  1955-  3rd  edition. 

Tetrachoric  Correlations: 


Tliis  type  of  correlation  coefficient  is  used  when  continuous  normally 
distributed  variables  are  measured  dichotomously. 

This  program  ir;  based  on  a  program  by  Roald  Buhler  at  Princeton 
University  which  in  turn  is  based  on  a  65O  program  written  at  the 
Educational  Testing  Service.   The  approximation  used  was  developed 
by  Professor  Ledyard  Tucker. 

Restrictions 

A.  Partial  Correlations: 

Input  matrices  may  be  no  larr;er  than  lUO  x  lUO  and  must  be  compatible 
with  SOUPAC  conventions.   In  m.ost  cases  the  original  input  to  the 
program  will  be  a  matrix  of  zero  order  correlations  (see  CORRELATION 
program  write-up")  . 

B.  Tetrachoric  Correlations: 

This  option  is  limited  to  l40  variables.   All  observations  should  be 
coded  either  0  or  1.   The  program  generates  cross-count  tables  before 
computing  the  correlation  coefficients. 
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n 


Parameters 


The  program  name,  PAETIAI.  CORRELATION,  should  be  followed  by  the 
following  parameters: 


Parameter 
Number 


2 

3 

1+  -  22 


Use  or  Meaning 

Input  address  of  R  if  partial  correlations  or  raw- 
data  if  tetrachoric  correlations.  (R  is  a 
correlation  matrix). 

Output  Address  of  correlations  desired. 

0  if  tetrachoric  correlations  are  desired 

1  if  partial  correlations  are  desirrd 

Variables  to  be  held  constant  in  using 
partiaJ.  correlations. 


IV.   Special  Comments 

When  there  is  a  zero  cell  or  sufficiently  close  so  that  the 
tetrachoric  correlation  cannot  be  computed  by  this  approximation,  a 
value  of  -1.0  is  used  if  the  missing  cell  is  off-diagonal.   If  a 
diagonal  cell  is  zeroish  (i.e.,  if  a  variable  is  all  zero  or  all  one) 
its  correlations  are  set  to  0.0. 

Blanks  are  counted  as  zeroes. 

\ ,   Examples 

A  series  of  observations  of  8  variables  are  used  to  obtain  },rd 
^rder  partial  correlations  with  variables  5.  1,    and  8  held  constant: 

/*ID 

//  EXEC  SOUPAC 

//SYSIN  DD  * 

CORRELATIONS  (CARDS) () (SEQ  l)  . 

PARTIAL  CORRELATION  (SEQ  1)  (PRINTMD  (5)  (7)  (8^  . 

end  soupac 
data(8)(8f6,2) 


END  # 
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REGRESSION-CORRELATION  PROGRAM 


I.   Purpose 

The  purpose  of  this  program  is  to  run  correlation  and  regression 
analysis.   This  program  replaces  the  previous  stand-alone  programs 
Correlation,  Canonical,  Multiple  Correlation,  Step-Wise  Multiple  Correla- 
tion and  Partial  Correlation. 

This  program,  through  the  use  of  the  VARIABLE  subparameter  card,  can 
be  used  to  process  subsets  of  the  original  input  variables. 


II.   General  Description 

A.   Correlation  Section 


w  =  weights  n 

m  =  sum  of  weights  [m  =  .Z  v.] 

n  =  sample  size  (if  Main  Parameter  ?  <  0,  n  =  m) 
x-j_j  =  raw  data;  jth  variable;  i'th  observation 
B  =  vector  of  Standard  Deviation 
X  =  means 

n 


Mean 


X.  = 

J 


•  S,  X.. 

1=1   1,1 


[if  weights  specified,  X.  = 


.Z^  w.x .  . 

1=1   1  1,1 


Cross -Products:   X'X  =   Z   (x  x   ) 
[ 1=1   ij  ik 

X'X   ^^^^^(^\) 


Covariance ; 


'jk 


n-1 


n(n-lT 


[if  weights:  .Z^  (w.x..x.,  )  ] 
1=1   1  ij  ik 


if  weights  (Main  Parameter  T  >  O) 
X'X 


(Zwx . ) (Zwx  ) 
A.  k 


jk   (n-1)       m(n-l) 
Standard  Deviation: 


s.  =  /  c  .  . 
J      JJ 


Correlation:   r 


jk 


.ik 


/~c 


'Jj      kk 
B.      Simple  Linear  Regression 


To    solve   the   equation  X,     =  A        +  B     X 

k    Jk    jk  j 


Regression  Coefficients 


S. 

B.,  =  r.,   -J- 
Jk    jk  S^ 


Intercept :   A 


jk 


X,  -  B.  X. 
k    Jk  J 
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C.   Partial  Correlations 

p  =  niim"ber  of  variables  to  be  partialed  out 
q  =  total  number  of  variables 

The  correlation  matrix  is  partitioned  as  follows: 


P 

p+1 


P 

r+1 

A 

B 

B' 

C 

P.,  =  C  -  B'C'^B 


ik 


J  J  kk 
So  output  is  of  the  form 


1        P  P+1 


1      0 

p  °"-^ 

0 

+1 

0 

q 

r 

Canonical  Correlations 

The  correlation  matrix  is  partitioned  as  follows 

p  =  number  of  predictor  variables 
q  =  total  number  of  variables 


1 

P 
p+1 


P  P+1 


Standardized  regression  coefficients:  ^   =  A     B 


R -Squared:   R^  =B'A^^B 


Eigenvalues  of  C  =  D 
Canonical  Matrix 


Eigenvectors  of  C  =  H 


(hd"^/^)'b'a"^b  (hd"^/^) 


Eigenvalues-Canonical  R'"  =  a"  =  Eigenvalues  of  Canonical  Matrix 


Canonical  R-Correlation :   A 


(a  diagonal  matrix) 
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Wilk's    Lambda:      A.    =    .11,     (l-A^.  ) 

Chi-Square:      Y-^   =   log         (^•)    (^  -   n) 
^ ^J  ^e         J  2 


th 


(for  the  j    function) 


Eigenvectors  of  Canonical  Matrix:   V 
Criteria  Weights:   W  =  (HD   '  )V 


Predictor  Weights:   W  =  BW  A 

Si p  (-; 


-1 


E.   Multiple  Correlation 

p  =  number  of  independent  variables 
q  =  total  number  of  variables 

The  correlation  matrix  is  partitioned  as  follows: 


P 
p+1 


p 

A 

p+1 

B 

B' 

C 

-1, 


Regression  Coefficients  (Standardized):   3  =  A  B 


Deviation  Covariance:   D  =  S(C-B'b)S'  r-   (unexplained  variances) 

*■     n-p-1 

1/2 


Deviation  (Partial)  Correlations:   D.,  =  D.,/(D..D,  ,  , 
jk    jk   jj  kk 

(Correlations  among  dependent  variables  with  the  independent  var .  partialled  out) 
Regression  Covariance:   C  =  SB '  gS '  ^—        (explained  variances) 


Multiple  Correlation:   R.  =  (B'3). 

J  J 


P 
1/2 


F  Ratio:   F 


J   n-p-1 


0 

i"'j_ 

p 


Testing  Hypothesis  R   =  0. 


Standard  Error  of  Estimate:  Se.  =  s.[(l-R.  )  —  1 

J     J      J    n-p-1 

s  . 

Regression  Coefficients  (Unstandardized )  :   b.  ,  =  — ^  6.  ■, 
— j.k    s^  ^j.k 


Dependent  Variable  Intercept:   b  ,  =  Y,  -  .Z^  b.  ,  X 
— ^' *^—    o  . k    k   j=l      J.k", 
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Standard  Error  of  Regression  Coefficients  (Unstandardized) : 
Standard  Error  of  Regression  Coefficients  (Standardized): 


T  =  Regression  Coefficient/Standard  Error:   T    =  -^^ Testing  Hypothesi: 

'  %.K        8  =  0-. 

P 

Predicted  Dependent  Variables;   y*=ti    +Zb   x 
' ^ "^k     o.k   j=l  j.kj 


Deviations  from  Observed:   z.  =  y.-y.* 

J  J      J 

n      ,  .2 

y      (z .  -   z .       ) 

Durbin-Watson  Statistic:      d.    =  

J  n 


F.   Stepwise  Multiple  Regression 

In  the  step-wise  procedure,  intermediate  results  are  used  to  give  valuable 
statistical  information  at  each  step  in  the  calculation.   These  intermediate 
answers  are  also  used  to  control  the  method  of  calculation.   A  number  of  inter- 
mediate regression  equations  are  obtained  by  adding  one  variable  at  a  time 
thus  giving  the  following  intermediate  equations: 


Y  =  B 


0 


+  B  X 
11 


where  Y  is  the  dependent  variable 


Y  =  E, 


Vi^ 


B^X^,  etc 


The  coefficients  for  each  of  these  interTnediate  equations  and  the  relia- 
bility of  each  coefficient  are  obtained  by  the  step-wise  procedure.   The 
coefficients  represent  the  best  values  when  the  eq^uation  is  fitted  by  the 
variables  included  in  the  equation.   The  variable  is  added  that  makes  the 
greatest  improvement  in  "goodness  of  fit"  or,  stated  another  way,  gives  the 
greated  reduction  in  variance  of  the  dependent  variable. 

A  variable  may  be  indicated  to  be  significant  at  an  early  stage  and  enter 
the  regression  equation.   After  several  other  variables  are  added  to  the 
regression  equation,  a  variable  in  the  equation  may  be  indicated  to  be  in- 
significant.  Under  this  situaton  the  step-wise  regression  procedure  will 
remove  the  insignificant  variable  before  adding  an  additional  variable.   Thus, 
at  the  various  steps  in  the  regression  procedure,  only  those  variables  which 
are  significant  will  be  included  in  the  regression  equation. 
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The  F  level  to  enter  a  variable  controls  when  variables  enter  the 
equation  and  the  F  level  to  remove  a  variable  likewise  controls  the  re- 
moving of  variables  from  the  equation. 

After  the  first  step  of  regression  subsequent  coefficient  and  error 
terms  depend  on  those  which  have  gone  before  in  an  iterative  manner. 

For  example,  the  standardized  regression  coefficients  result  from  a 
partial  inversion  of  the  correlations  matrix  (replacing  the  correlations 
with  the  dependent  variable).   The  diagonal  elements  of  this  inverse  axe 
also  used.   The  multiple  correlation  in  turn  comes  from  the  regression  co- 
efficients.  As  the  iteration  procedes  with  each  step  of  regression  new 
coefficients  result. 


Standardized  regression  coefficients:   B .    j  =  l,...,p  where  p  is  the  number 

J 


Unstandardized  regression  coefficient 


Multiple  correlation:   R  = 


/ 


1   JY  J 


of  independent  var  in  the  regression 

i.    =3.  *  Tr~       Y  is  the  dependent  var. 
J    1   S 


r   is  correlation  of  variable  j 
with  dependent  variable 


Intercept:   a  =  Y  -  ZB.X. 

J  J 


Standard  error  of  mean  of  Y: 


Se-  =  S^  /  1/(N-1) 


Standard  error  of  predicted  Y:   Se"  =  SY  /  (l-R^^^  )/N-k-l )    Y  is  predicted  Y 


Standard  error  of  estimate:   Se  ^  =  SY  /(l-R^) (N-1 ) /N-k-1 

est 


=  Se-  /  N-1 


Standard  error  of  unstandardized  coefficient:   Se   =  (Se;^/S.)  /dT 
D.  is  diagonal  element  of  partially        j 


J 


inverted  correlation 


Standard  error  of  standardized  coefficient:   Se^   =  Se   *  ^ 

J      J     Y 

B, 


T  ratio:   T  = 


A. 


Se. 


Degrees  of  freedom:   Df  =  N-p-1 


'he  above  are  printed  by  default  at  the  end  of  each  iteration. 


.  .■  ■  '.'■.''■y^jy   ■'■■  ■  '  ■•  yi 
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III 


Parameters 

A.   Main  Parameter  Card 
REG 
1 


10 


11 


12 


Mnemonic 

Input  Address  of  raw  data  (default),  Augmented 
Cross  Products,  Covariance  or  Correlations 
Matrices  (see  notes  on  Type  of  Input). 

Output  Address  of  Means,  Standard  Deviation, 
Sample  Size,  Sum  of  Weights  (stored  in  h   rows). 

Output  Address  of  Correlations  (if  subparameters 
are  specified,  the  correlation  matrix  must  "be 
saved  on  a  sequential  file). 

Output  Address  of  Covariance  Matrix. 

Output  Address  of  Raw  Data  Cross  Products 
Matrix. 

Input  Address  of  labels. 

Weights  (1,  0,  -1;  Default  O) 


TYPE  of  Inputs 
(see  notes  on 
types  of  input) 


0  Raw  Data  (default) 

1  Cross  Products 

2  Covariance 

3  Correlation 
U  Correlation 


for  all 
inversions 


Search  for  Pivotal  Elements 

0  or  Default  -  no  search 

1  Perform  search  over  entire  matrix 

Test  for  Positive  Definiteness 

0  or  Default  -  perform  test 

1  Ignore  test 

Tolerance  (Default  *1.0E-5*),  must  be 
enclosed  in  asterisks,  "*  *". 


Output  Address  of  Augmented  Correlation  matrix 
suitable  for  input  (print  not  allowed, 
correj.ation  matrix  must  have  been  saved  on  a 
sequential  file). 


B. 

Sub  parameter 

Card 

1 .   CONTROL 

(mnemonic :   CON) 

CON 

(must  be  first  subparameter  card?   if  used) 

1-20 

Control  Variables 

(Control  Variables  must  be  eliminated  for  any 
further  analysis.   Only  the  first  subset  is 
output  to  a  sequential  file.) 

V.HEG.Y 

2.   VARIABLE  (mnemonic:   VAR ) 

Purpose:  To  specify  a  subset  and  order  of  variables  to  be  used 
in  all  following  subparameter  cards  until  another  VAR 
card  is  encountered. 

The  absence  of  any  VAR  cards  or  a  VAR  card  without  parameters 

indicates  that  all  the  input  variables  are  to  be  used  in  their 

original  order. 

The  parameters  always  refer  to  the  order  of  variables  originally 

entered. 

Form:   Index  sets  (see  below). 

Examples: 

1.  VAR(1)(2)(?.)(5)(T). 

Variables  1,  2,  3,  5,  and  7  are  used  (5  variables). 

2.  VAR(1,2)(3,T). 

Variables  1  through  2  and  3  through  7  are  used  (7  variables). 

3.  VAR(1,2)(3,7,2). 

Variables  1  through  2  and  3  through  7  in  steps  of  2  (3,  5, 

and  7)  are  used  (5  variables). 
Caution:   If  a  variable  is  specified  more  than  once,  the  sub- 
parameter  cardc  will  use  the  variable  more  than  once.   This  may 
cause  singular  matrices  in  MUL  or  CAN. 

3.   SIMPLE  LINEAR  REGRESSION  (Mnemonic:   SIM) 

1  fi     Output  Address  of  Coefficients  (E). 

2  Q  Output  Address  of  Intercepts  (A). 

h,      PARTIAL  CORRELATIONS  (Mnemonic:   PAR) 

The  first  N  variables  are  partialled  out.   The  variables  may 
be  reordered  by  use  of  a  VAR  card.   The  correlations  with 
those  variables  which  are  partialled  out  are  set  to  0,  the 

diagonal  of  the  matrix  is  set  to  1. 

1  Number  of  variables  to  be  partialled  out . 

2  Q.  Output  Address  of  Partial  Correlation  Matrix. 
5.   MULTIPLE  LINEAR  REGRESSION  (Mnemonic:   MUL) 


1 


Number  of  independent  variables. 
Ind .  var ,  +  dep .  var .  +  wt .  var .  + 

control  var.  <  U50. 
Ind.  var.  +  dep.  var.  <  ^30. 
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Output  Address  of  coefficients  (unstandardized ) . 
SEQUENTIAL  1-15,  PRINT  is  default.   Coefficients 
for  P  independent  and  L  dependent  variables  are 
vrritten  as  L  rows  with  L  +  P  +  1  columns  each. 
The  i^"  row  contains  in  order  the  i  ^  intercept 
term,  the  P  coefficients  for  the  i''^^  dependent 
variable,  a  -1  in  the  P  +  i  +  1  location,  and 
O's  for  all  other  locations.   This  format  is 
compatible  with  the  ECONOMETRICS  REDUCED  FORM 
AND  RESIDUAL  ANALYSIS  program. 

Output  address  of  inverse  of  augmented  independent 
variables  cross-products  matrix.    SEQUENTIAL 
1-15  and/or  PRINT. 


fi 


Output  Address  of  predicted  dependent 
variables.   SEQUENTIAL  1-15  and/or 
PRINT  (also  P(F)  is  optional). 

Output  address  of  deviations  from 
actual.   SEQUENTIAL  1-15  and/or 
PRINT  (also  P(F)  is  optional). 


Raw  data 
must  be 
stored  on 
a  sequen- 
tial file, 


Output  Address  of  Durbin-Watson  and  second, 

third  and  fourth  powers  of  sums  of  deviations. 

SEQUENTIAL  1-15  and/or  PRINT. 

Row  1  Durbin-Watson  Coefficients. 

Row  2  S(y-y*)2 

Row  3  Z(y-y*)3. 

Row  14   Z(y-y*)^ 


Q 


Output  Address  of  deviation  covariance  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 

Output  Address  of  deviation  (partial  correlation 
matrix.   SEQUENTIAL  1-15  and/or  PRINT. 


Output  Address  for  regression  covariance  matrix 
SEQUENTIAL  1-15  and/or  PRINT. 


6.   CANONICAL  CORRELATIONS  (Mnemoic:   CAN) 

The  predictor  variables  must  precede  the  criteria  variables. 
Reordering  may  be  done  by  use  of  a  VAR  card. 


;l 


Number  of  predictors  (must  be  greater  than  the 
number  of  criteria). 


fi 
fi 


Output  Address  of  Predictor  Weighting  Matrix 

Output  Address  of  Criteria  Weighting  Matrix. 

Output  Address  of  Standardized  Regression 
Coefficients  (Print  only). 


Output  Address  of  R   (Print  only). 
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6  Q.  Output  Address  of  Standardized  Predictor 

Weighting  Matrix.* 

7  9.  Output  Address  of  Standardized  Criteria 

Weighting  Matrix*. 

8  Output  Address  of  Eigenvalues  (Sequentials 

only)  as  a  row. 

*  Weighting  matrices  are  standardized  so 
that  the  sum  of  the  square  for  each  function 
(column)  is  equal  to  1. 

7.   STEPWISE  MULTIPLE  REGRESSION 

Only  one  dependent  variable  is  allowed.   It  must  be  the  rightmost  variable. 

1  "F"  level  to  enter  an  independent  variable  into 

the  regression  equation.   An  example  would  be:   *U.O* 

2  "F"  level  to  remove  a  variable  from  the  regression 

equation.   An  example  would  be:   *U.O*. 

3  1  if  constant  term  in  equation  is  assumed  to  equal 

zero  (O).   [if  this  option  is  used  the  raw  data 
cross-products  matrix  must  be  saved  on  a  Sequential 
file. ] 

h  Output  Address  of  coefficients  (Sequential  file, 

only) . 

5  Output  Address  of  predicted  dependent  variables 

(Print  only).   Input  may  not  come   from  cards. 


First  (N)  variables  are  placed  in  regression  first. 

First  (N)  variables  are  to  be  kept  in  the  regression 
once  they  are  entered. 

1  if  intermediate  steps  of  regression  are  not  to  be 
printed. 


!.      FlKpl.'ination 

A.   Type  of  Input  (Main  Parameter  8) 

0  RAW  DATA  is  input 


''SWt'^nfiiKiAMnAiF. 
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TYPE 
1 


An  augmented  Cross  Products  Matrix  is  input 
The  form  is  as  follows: 


EX 


zx 


X'X 


An  augmented  Covariance  Matrix  is  input 
form  is  as  follows: 


The 


ZX 


EX 


Sk 


(covariance) 


An  augmented  Correlation  Matrix  is  input 
form  is  as  follows: 


The 


'Std.  dev. )  S 


X   (means) 

K 


r.,  (correlations) 


This  option  may  be  used  to  input  a  previously- 
calculated  matrix  (Correlation,  Covariance, 
or  Cross -Products)  for  use  as  input  to  a  sub- 
operation. 

B.  Weights  (Main  Parameter  7) 

If  the  Weights  Flag  is  /  0  and  raw  data  is  input,  then  the 
weighting  variable  must  be  the  rightmost  variable. 

If  the  Weights  Flag  is  >  0  then  the  input  matrices  of  type 
1,  2,  or  3  should  be  augmented  as  follows: 


TYPE 


EX 


FORM 


n 

EX 

EX 

X'X 

m 

— 

EX 


C.,   (covariance' 

Jk 
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n 

\ 

=j 

•■jk 

m 



If  the  Weights  Flag  is  <  0  then  the  sum  of  weights  should  be 
substituted  for  n. 

C.   Miscellaneous 

1.  If  suboperations  are  requested,  the  Correlation  matrix  must  be 
output  to  a  Sequential  file.   Be  sure  not  to  use  this  file  as  an 
output  address  for  any  subparameter  cards. 

2.  All  output  is  in  the  form  of  a  matrix  permuted  in  order  of  the 
variable  specified  on  a  VAR  card. 

3.  All  printed  matrices  are  given  variable  n^umbers  determined  upon 
entry  at  the  main  input  address. 

k.      If  Predicted  dependent  variables  and /or  deviation  from  predicted 
values  in  MULTIPLE  Regression  are  printed,  the  use  of  labels  will 
Increase  the  number  of  lines  printed  by  20^  to  25^. 


i'T'J'vV''' 


^:t;.:■:;;■^ 
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Examples 

A.   Labels 

The  label  deck  must  precede  any  data  input  to  this  program  by  cards.   Up 
to  8  characters  may  be  used  to  label  each  variable,  therefore  the  format  field 
may  be  up  to  8  columns  in  vldth.   The  labels  should  be  left  Justified  within 
each  field. 


REG(C)()(S1)()()(C) 


ENDP 
ENDS 

data(t)(Ta8) 

one    two    three 

END# 
DATA(T)(TF2.0) 


END# 

B.   VARIABLE  Card 


FOUR 


FIVE 


SIX 


SEVEN 


MULTIPLE  CORRELATIONS 

To  run  Multiple  Correlation  on  the  following  equations: 

n 


S-\'  \h  '  SS 


X^  =  bQ  ^  V2 "  W  '  SS  "  V6 


REG(C)(P)(S1/P). 
VAR(1)(3)(5). 

VAR(2,6,2)(5)(1). 
MUL(U)  #2 

ENDP 


CANONICAL  Correlations 

a.   To  permute  the  predictor  and  criteria  variables. 


REG(C)()(S1). 

VAR(7,li+)(l,6). 

CAN(8)(P)(P)(P)(P). 

ENDP 


b.   To  use  different  subsets  of  nredictor  or  criteria  variables, 


Z  b.X.  =  Z  b.X. 
i   1  X    .JO 


#1   i  =  7,  8,  9,  li+;  j  =  1,  2,  3 

#2   i  =  7,  8,  9,  10,  11;  j  =  1,  2,  U,  5,  6 
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REG(C)()(S1). 

VAR(T,9)(1^)(1,3). 

CAN(i|)(P)(P)(P)(P). 

VAR(7,11)(1,2)(U,6). 

CAN(5)(P)(P)(P)(P). 

ENDP 


n 

tf2 


3.   PARTIAL  Correlations 

To  specify  the  order  of  variables  where  the  first  n  variables  will  be 
parti ailed  out. 


#1 


ij.T89 
'ij.l5T8 


(assume  10  variables) 


REG(C)()(S1). 

VAR(T,9)(1,6)(10). 

PAR(3)(P). 

VAR(l,5,i+)(7,8)(2,U)(6)(9,10). 

PAR(U)(P). 

ENDP 

C.   Input  Types 


1.   Augmented  Cross  Products 


MAT. 

GEN*S2*1*. 

M0V(C)(S1).   RAW  pATA 

EXP(S2)(S1)(S3). 

H0R(S3)(S1)(S2). 

TRA(S2)(S3). 

MUL(S3)(S2)(S1). 

ENDP 

REG(S1)(P)(S2/P)()()()()(1) 


ENDP 
ENDS 


TyPE=l 


2.   Augmented  Correlation 

REG(C)()(S1)()()()()()()()()(S2). 
ENDP 

REG(S2)()(S1)()()()()(3). 


TYPE=3 


ENDP 
ENDS 

3.   Unaugmented  Correlation    TYPE=U 

C0R(C)()(S1). 
REG(S1)()(S2)()()()()(1|). 

ENDP 


\^0^'>3tS66«IC 
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D.  CONTROL  Card 

The  CONTROL  Card  is  similar  in  use  to  a  $C-B  card  with  the  CORRLEATIONS 
program.   The  following  forms  are  equivalent. 

REG(C)(P)(S1/P). 

CON(5)(9)(ll). 

ENDP 

C0R(C)(P)(S1/P). 
$C-B(5)(9)(ll). 

E.  Equivalent   Forms 

This  section  indicates  the  form  needed  to  replace  current  stand-alone 
programs  with  the  Regression  Program. 

The  superscripts  indicate  like  uses  of  the  parameters  between  the  two 

forms . 

1.  Correlations 

a.   C0R(C)(P)(P)(P)(P)(P)(P). 

REG(C)(P)(S1/P)(P)(P). 

SIM  (P)(P). 

ENDP 

h.   C0R(C)(P)(P). 

REG(C)(P)(P). 
ENDP 

2.  Multiple  Correlations 

a.   MUL(Sl)(6)(P)(p)(?)(p)(p)(P)(P)(r)(p5(p5(F](P). 
REG(S1)(P)(S2/P)(P)(P). 

2    6    1'4  3    <♦    13101112 

MUL(6)(P)(P)(P)(P)(p)(p5(p5(p1. 

ENDP 


b.   MUL(C)(5)()()(P)h(P) 

REG(C)(P)(S1/P). 

MUL(5). 
ENDP 


3-   Canonical  Correlations 

a.   C0R(C)( )(S1). 

CM(S1)(50)(P)(P)(8)(6)(1)(1)(1) 
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REG(C)()(S1). 

CM(8)(P)(P)(P)(P). 

ENDP 

b.   CAN(C)()(P)(P)(8)(6)(1)(1)(1) 

REG(C)()(S1)()()()()(M. 

CAN(8)(P)(P)(P)(P). 
ENDP 

U.      Partial  Correlations 
1 

a.  COR(C)( )(S1). 

PAR(S1)(P)(1)(1)(2)(3). 

1 
REG(C)()(S1). 

2 

PAR(3)(P). 
ENDP 

b.  PAR(C)(P)(1)(1)(2)(3). 
REG(C)()(S1)()()()()(1+). 

2 

PAR(3)(P). 

ENDP 


Addendum;   The  following  additional  subpara^eter  card  is  available. 
TRANSFORMATIONS  (Mnemonic:   R-T) 

address'  °^,  f  ^^f  ^^"^f  i°-^  °f  the  correlation  matrix  is  output  to  a  specified 
dlstribut  J  w  t'^^'^  ^"""^"^  correlations  will  be  approximately  normally 
"SiS::?;  iT^lJllTJl/iZT".    ^'  ^""-^   '   ''   ^-  -™-  --edition 

n-2  de1r::;Tf^Se:dom!'^''°^^  ""'''  '^^'  '''   ^^^^^^^^  t-distribution  with 


Output  address  of  fisher  Z  transformations 
Output  address  of  T-transformations . 
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STEP-WISE  MULTIPLE  CORRELATION 


It   General  Description 

The  STEP-WISE  MUT.,TIPLE  CORRELATION  program  calculates  the  follov/ing  basic 
statistics  before  the  step-wise  procedure  begins.   All  variables  are  included. 


Mean :   X-; 


N 


Crossproducts :   P..  =  Z(X.X.) 
c ij      1  J 


NS(X.X.)  -  (Z?C.)(ZX. 
Covariance:   S,  _.  =  ^  ^n'(N-1)~ ~ 


ij 


Standard  Deviation:   S-  =  (s.  . 
1    ^  11 


,1/2 


All  summation  is  over  the 
sample. 


Product  Moment  Correlation:   r. 


ij 


10   s^s. 

In  the  step-wise  procedure,  intermediate  results  are  used  to  give 
valuable  statistical  information  at  each  step  in  the  calculation.   These 
intermediate  answers  are  also  used  to  control  the  method  of  calculation. 
A  number  of  intermediate  regression  equations  are  obtained  by  adding  one 
variable  at  a  time  thus  giving  the  following  intermediate  equations. 

a.  Y  =  Bq  +  B^  Xj^  where  Y  is  the  dependent  variable. 

b.  Y  =  Bg  +  B2_X2  +  B2X2  ,  etc. 

The  coefficients  for  each  of  these  intermediate  equations  and  the 
reliability  of  each  coefficient  are  obtained  by  the  step-wise  procedure. 
The  values  and  reliability  may  vary  with  each  subsequent  equation.   The 
coefficients  represent  the  best  values  when  the  equation  is  fitted  by  the 
variables  included  in  the  equation.   The  variable  is  added  that  makes  the 
greatest  improvement  in  "goodness  of  fit"  or,  stated  another  way,  gives  the 
greatest  redaction  in  variance  of  the  dependent  variable. 

A  variable  may  be  indicated  to  be  significant  at  an  early  stage  and 
enter  the  regression  equation.   After  several  other  variables  are  added  to 
the  regression  equation,  a  variable  in  the  equation  may  be  indicated  to  be 
insignificant.   Under  this  situation  the  step-wise  regression  procedure  will 
remove  the  insignificant  variable  before  adding  an  additional  variable. 
Thus,  at  the  various  steps  in  the  regression  procedure,  only  those  variables 
which  are  significant  will  be  included  in  the  regression  equation. 

The  F  level  to  enter  a  variable  controls  when  variables  enter  the 
equation  and  the  F  level  to  remove  a  variable  likcv/isc  controls  Lho  removing 
of  variables  from  the  equation. 

The  last  step  in  the  step-wise  procedure  predicts  the  value  of  the 
dependent  variable  for  each  set  of  observations  based  on  the  final  re- 
gression equation.   Deviation  between  the  actual  and  predicted  values  are 
also  calculated. 

(See  parameter  h) . 
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TI .   Calculations  and  Formulas 

After  the  simple  correlations  and  the  first  step  of  regression  sub- 
sequent coefficients  and  error  terms  depend  on  those  which  have  gone 
before  in  an  iterative  manner. 

For  example,  the  standardized  regression  coefficients  result  from  a 
partial  invei'sion  of  the  correlations  matrix  (replacing  the  correlations 
with  the  dependent  variable).   The  diagonal  elements  of  this  inverse  are 
also  used.   The  multiple  correlation  in  turn  comes  from  the  regression 
coefficients.   As  the  iteration   precedes  with  each  step  of  regression 
new  coefficients  result. 

Standardized  regression  coefficients:   3.    i  =  l,...,k  where  k  isthe  no. 

■'■  of  indep.  var  in  the  regression 


Unstandardized  regression  coefficient:   B.  =3-  o 

1    1  S 


Y  is  the  depen- 
dent variable 


Multiple  correlation:   R  =\  .2,  r* .  B. 

y 1=1  lY  1 


r.   is  correlation  of 
lY 

variable  i  with  dep.  var, 


Intercept:   C  =  Y  -  X3.X. 

11 


Standard  error  of  mean  of  Y:   Se^  =  S^/  l/(N-l)   N  is  sample  size 


Standard  error  of  predicted  Y:   Se^  =  S^/  (l-R'^)/N-k-l)   0   i^  predicted  Y 

Y    ^ 


Standard  error  of  estimate:   Se  ^  =  S,/  (l-R-- )  (N-1  )/N-k-l ) 

est    Y 


=  Se^  /  N-1 


Standard  error  of  unstandardized  coefficient:   Sc^  =  (Se/v/Si)/  Di 

i      ^ 
Di  is  diagonal      element  of 

partially  inverted  correlation  matrix 

S. 

Standard  error  of  standardized  coefficient:   Se^  =  Se^   F~ 

^    ^i  ^Y 


T  ratio:   T  = 


3. 

1 


Se 


Degrees  of  freedom:   Df  =  N-k-1 


6. 

1 


lil.   References 


A.  Ralston  and  H.  S.  Wilf,  Mathematical  Methods  for  Digital  Computers, 
New  York,  Wiley  and  Sons,  19^0,  pp.  191-195. 
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IV.   Restrictions 

The  dependent  variable  must  be  after  all  the  independent  variables  in 
each  row.   Only  one  dependent  variable  is  allowed  per  program  call. 

The  input  data  to  this  program  may  come  from  any  source  conforming  to 
SOUPAC.   Output  may  be  PRINT  only  except  for  coefficients  (see  parameters). 

V.   Output 

The  following  is  normally  printed:   Crossproducts,  Means  and  Standard 
Deviations,  Covariance,  Correlations,  Standard  Error  of  Mean  of  Dependent 
Variable.   Plus  for  each  step  of  regression:   F  level,  Standard  Ei-ror  of  Pre- 
dicted Dependent  Variable,  Multiple  Correlation,  Standard  Error  of  Estimate, 
Dependent  Variable  Intercept,  Degrees  of  Freedom  for  F.   And,  for  each  Inde- 
pendent Variable  in  the  step:   Unstandardized  Regression  Coefficient, 
Standard  Error  of  Unstandardized  Regression  Coefficient,  Standardized  Re- 
gression Coefficient,  Standard  Error  of  Standardized  Regression  Coefficient, 
T,  Degrees  of  Freedom  for  T.   All  printing  may  be  suppressed  except  the 
final  step  of  regression. 

The  coefficients  which  may  be  requested  on  temporary  storage  are  un- 
standardized.  The  output  row  consists  of  the  intercept,  the  coefficients, 
and  a  minus  1.   Any  independent  variable  not  entered  into  regression  will  get 
a  zero  coefficient  output. 

VI.   Parameters 


The  parameters  for  the  STEP-WISE  MULTIPLE  CORRELATION  program  appear  on 
the  program  call  card.   They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 


Descriptions 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
(see  parameter  h   for  special  conditions.) 

"F"  level  to  enter  independent  variable  into 
the  regression  equation.  An  example  would  be: 
«l+.0* 

"F"  level  to  remove  a  variable  from  the  re- 
gression equation.   An  example  would  be:  *I|-.0*, 

This  parameter  should  be  set  to  1  if  the  pre- 
dicted dependent  variables  are  to  be 
calculated.   (if  this  option  is  needed,  input 
data  must  not  be  from  card.)   0  or  blank  if 
not  wanted. 


1  if  constant  term  in  equation  is  assumed  to 
equal  zero  ( O) . 

1  if  want  to  use  weighting  factor.   (if  a 
weighting  factor  is  used,  it  must  be  the  last 
variable  in  the  input  data  row.) 
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Parameter 
Number 


Description 

1  if  intermediate  steps  of  regression 
are  not  to  be  printed. 

1  if  do  not  want  cross-product  matrix 
printed;  2  if  input  data  is  in  the 
following  form : 


10 
11 

12 


13 
lU 

15 


1 

N 

N+1 

M 

CORRELATIOI! 

E 

A 

i; 

MATRIX 

N 

s 

STAiroARD 

.Sample 

N+1 

DEVIATION 

size 

(where  N  is  the  number  of  variables 

1  if  do  not  want  means  and  standard 
deviations  printed 

1  if  do  not  want  covariance  to  be  printed 

1  if  do  not  want  correlations  to  be 
printed. 

Tolerance  to  be  used  to  determine  when 
singularities  are  assumed  to  occur.   If 
this  parameter  is  left  blank  10~5  is  used. 
If  it  is  desired  to  change  this  parameter, 
the  following  would  be  used:   *1.E-10* 
where  any  number  could  be  substituted  for 
the  10. 

Output  (intermediate  storage)  of  coefficients 

First  (N)  variables  are  placed  in  regression 
first. 

First  (N)  variables  are  kept  in  regression, 
if  entered. 


VII .   Special  Comments 

The  dependent  variable  must  be  the  last  variable  in  the  input  row 
(unless  a  weighting  factor  is  used,  then  the  dependent  variable  will  be 
the  next  to  the  last  variable  in  the  input  row. ) 


DISTRIBUTION  ANALYSIS  PACKAGE 


FIT 


(CHI-SQUARE  GOODNESS-OF-FIT  TEST ^ 


I.   General  Description 

In  statistical  applications,  it  is  frequently  the  case  that  certain 
assumptions  were  made  concerning  the  probability  distribution  of  a  random 
variable.   A  frequent  assiomption  is  that  a  particular  variable  is  normally 
distributed.   The  question  that  arises  is  "how  valid  an  assumption  was 
this?"  A  method  of  testing  this  assumption  (or  hypothesis)  is  the  CHI- 
SQUARE  GOODNESS-OF-FIT  TEST. 

This  program  provides  tests  of  hypothesis  that  user's  data  is 
(l)  a  random  sample  from  the  distribution  P0,  where  0  is  a  user  specified 
parameter,  or  (2)  a  random  sample  from  a  class,  P0,  of  distributions  where 
0  is  not  specified.   These  will  be  called  tests  of  type  1  and  type  2  , 
respectively. 

Distributions  which  can  currently  be  tested  in  the  program  are  the 
binomial,  Poisson,  normal,  gamma  and  continuous  rectangular  distributions. 
The  user  may,  for  example,  wish  to  test  the  hypothesis  that   his  data  was 
from  a  normal  distribution  with  variance  1,  for  some  mean. 


The  program  on  the  basis  of  the  user  provided  information  decides  on 


a  set  of  points 


^=1  ' 


^2' 


X 


in  the  range  of  the  distribution  under 


consideration.   The  test  then  compares  the  observed  numbers,  o.  ,of 
observations  in  each  interval  [X.   ,  X.),  with  the  expected  numbers 


=  Pg  ^^i-1  1  ^1  1 


X.} 

1 


{Y. 


1  <  i  <  m  } 


Where  0  is  the  user  specified  value  of  0  ,  if  it  was  specified,  and  if 
not,  0  is  the  maximum  likelihood  estimator  for  a  hypothesis  of  form  (2) 
above.   The  comparison  of  o.  with  e.  is  made  by  the  statistic 

n+1  (o  -  e  )2 
X   =   Z   


i=l 


1 


The  distribution  of  this  statistic  has  an  approximate  X   distribution  when 
the  hypothesis  is  true  and  n  is  large.   (See  Billingsley  ,  I961 ) . 
The  program  prints  the  computed  value  of  X  and  the  number  of  degrees  of 
freedom  for  the  test. 


If  one  sample,  with  m  observations,  is  to  be  tested,  input  to  the  program 
from  cards  or  sequential  will  be  one  variable  with  m  observations.   More 
than  one  sample  may  be  tested  at  a  time,  if  the  same  distribution,  or  in  some 
cases  a  distribution  from  the  same  class,  is  the  one  being  considered.   Thus 
a  test  of  each  of  three  samples,  all  of  which  are  on  the  same  sequential 
storage  or  card  deck,   may  be  performed. 
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II, 


Distributions  Available 


NOTATION:   In  the  description  below  f(x)  denotes  the  value  of  the  probability, 
density  function  of  the  point  x.   F(x)  is  the  ci:imulative  distribution  function 
defined  as   Pr  {X  <  x  } 


A.   DISCREET  DISTRIBUTIONS 


The  Distributions  are  classified  here  as  being  "discreet,"  (i.e. 
having  positive  probability  at  a  countable  number  of  values  of  the 
random  variable  it  describes)  or  as  "Continuous"  (having  positive 
density  over  a  continuous  range  of  values  of  the  random  variable). 

1.   BINOMIAL(N,P)   N  a  positive  integer;   0  <  p  <  1 


f(x)   Qp^  (1-p) 


N-x 


for  X  an  integer 
such  that   0  <  X  <  N 


F(x)  ):  P)  p^  (i-p) 

i=0  ^  ' 


N-i 


for  x  an  integer 
such  that  0  <   X   ^   H 


This  distribution  is  appropriate,  for  example,  in  situations 
where  a  random  variable  is  sampled  independently  (observed)  N  times, 
the  observations,  and  where  the  values  of  the  random  variable  can 
be  classified  precisely  as  being  in  one  of  two  sets  often  denoted 
"success,"  and  "failure,"  with  probabilities  p,  and  q  =  1  -  p 
respectively. 


2.   POISSON  (X) 
f(x)  = 


x->o 


.  X   -A 
A   e 


X  >_  0  ;  and  x  an  integer 


X 

F(x)  =   Z 
i  =  0 


-A 


X  > 


The  Poisson  distribution  is  often  used  as  an  approximation 
to  the  binomial  when  the  number  of  observations  is  large, 
p  =  Pr  {success}   is  small,  and  the  product  n  *  p  is  essentially  a 
constant.   The  Poisson  is  a  pervasive  distribution  in  its  own 
right,  arising  in  situations  where  the  probability  of  no  occurrences, 
or  of  one  occurrence  of  a  phenomenon  in  a  unit  of  time  is  moderate, 
where  the  probability  of  more  than  one  occurrence  is  essentially 
negligible  in  comparison,  and  where  the  frequencies  of  occurrence  in 
adjacent  intervals  are  independent  of  each  other. 


^^%<$^^; 
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B.   CONTINUOUS  DISTRIBUTIONS 


2 


NORMAL  i\i^    0    )      -00  <  jj  <  +00 


>  0 


f(x)  = 


/     T 

20  a 


-(X  -  I 
exp  2" 

2  o^ 


.00  <  X  <  +°o 


F(x) 


"T 


2  n  o 


exp 


-(t  -  M 

15 

2a" 


dt 


-orj   <   X   *•   '^'^ 


Through  the  Central  Limit  Theorem,  the  use  of  this  distrihution 
has  been  justified  to  describe  a  tremendous  variety  of  phenomena  in 
which  the  random  variable  under  consideration  is  assumed  to  he  the 
sum  of  a  large  nujnber  of  independent  random  variables,  each  having 
a  small  contribution  to  the  total. 

2.   GAIvIKA  (a, 6)    a>0   ,   6>0 

.a    a-1    -6x 
^/  \      p    X     e 

f(x;   =  — 77 c for  X  -  0 

A  (a  } 


F(x) 


,,a    a-1   -Bt 
li        t e    dt 

A(a) 


^pre  A(a)  =   (a-l)  A (a-l) 
integer,   A(a)  =  (a-l)! 


and  so  if  a  is  a  non-negative 


The  gamma  distribution  is  the  sampling  distribution  for  the 
sum  of  a  independent  identically  distributed  "negative  exponential" 
random  variables,  with  B  =X  where  A  is  the  parameter  for  the 
negative  exponential  distribution.   The  negative  exponential 
distribution  is  itself  the  special  case  gamma  (l,g).   The  gamma  distri- 
bution is  used  in  dealing  with  waiting  times,  where  the  expected 
fx'equency  in  a  given  interval  has  Poisson  distribution.   The 
"Chi-Square"  distribution  is  another  important  special  case  of  the 
gamma  distribution  where 

o 

y"    (r)  i  garama  (r/2,  1/2). 


3.   RECTANGULAR  (O^,  0^) 
f(x)  = 


I-'.-  0. 


0^  •-  02 


01  '^  X  <     0. 


:x)  = 


X 


■•'1 


X-0. 


dt  = 


'2--'l 


2   1 


©1  <_  X  ^02 


.iJi9i[iK«««atB& 
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This  distribution  is  appropriate  where  any  event  in  an 
interval   {Gj ,  02)  has  equal  probability  of  occurring,  but  no 
occurrences  will  be  outside  the  interval. 

III.   Use  of  the  Program 

A.  PRELIMINARY 

The  program  is  invoked  by  the  main  parameter  card  (a  30UPAC 
"Program  Card").   Information  concerning  the  distribution  and 
intervals  to  be  used  are  supplied  on  subparameter  cards  for  the 
program. 

Input  is  in  the  form  of  column  vectors,  and  may  be  from  cards  or 
sequential  storage.   Each  variable  (vector  of  data  will  be  tested 
against  the  same  type  of  distribution,  i.e.  information  provided 
on  the  distribution  card  applies  to  all  variables. 

The  subparameter  card  provides  the  mean<^  of  ='"rp"'y' r.3  information 
to  the  program  concerning  tne  distribution  to  be  used,  as  well  a? 
the  number  and  size  of  intervals  to  be  used  in  the  tests.   In 
some  cases  a  distribution  parsjneter  must  be  supplied  by  the  user. 

B.  MAIN  PARAI^ETER  CARD 

Immediately  following  the  program  name  CHI-SQUARE  GOODNESS-OF-FIT 
(mnemonic:   FIT),  the  following  parameters  are  listed. 


Parameter 
Number 


Use  of  Meaning 


Input  Address.   May  be  CARDS  or 
SEQUENTIAL  (Sl,  or  S2,  etc). 


SUBPARAMETER  CARDS 


GENERAL  DESCRIPTION 


One  "distribution  card','  chosen  from  the  list  below, 
should  appear  with  each  call  to  the  program.   For  tests  of 
type  1  ("simple  hypothesis")  the  parameters  of  the  distribution 
are  specified  on  this  card.   For  distributions  allowing  tests 
of  type  2  ("composite  hypothesis")  one  or  more  of  these 
parameters  may  be  left  blank.   Other  parameters  on  the  card 
relate  to  intervals  to  be  used  for  determining  observed  and 
expected  frequencies.   Because  this  test  is  "asymptotically 
valid,"  these  should  be  chosen  so  that  expected  frequencies  of 
any  interval  does  not  fall  below  5. 

DISTRIBUTION  CARDS 


Where  *  *  is  used,  a  parameter  requires  a  decimal  point 
number;  when  (  )  appears  an  integer  is  required. 
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DISTRIBUTION 

BINOMIAL(M)*P*(N). 

M  and  P_  are  parameters  of  the  distribution,  P  the  probability 
of  success  on  a  trial,  M  the  number  of  trials.   N  is  the  number 
of  points  (integral  values  starting  at  O)  to  be  grouped  in  each 
interval.   For  a  test  of  Type  2,  the  parameter  P  should  be 
left  blank. 

POISSON*A*(ENDPOiriT)  (N) . 

A  is  the  Poisson  density  parameter,  and  may  be  left  blank  for 
a  test  of  type  2.   N  is  the  number  of  adjacent  points  to  be 
grouped  in  each  interval  until  ENDPOINT  is  reached.   The 
Interval   [EiroPOINT,  +«)  is  then  the  last  interval. 

RECTANGULAR*Gl**02*(lO  . 

01,  and  02  are  the  endpoints  for  the  range  of  the  distribution 

01  <  02.   N  is  the  number  of  equal  sized  intervals  to  use  for 

the  t^str   Only  tests  of  type  1  are  allowed  with  this  distribution. 

o 

normal*u-'^^o''**stpt**zp*(n)  . 

o 

y  is  the  mean  of  the  distribution,  a^is  the  variance.   Either 
or  both  of  the  parameters  may  be  left  blank  for  a  test  of 
type  2.   STPT  and  EP  are  the  start  point  and  end  point  of  an 
interval  to  be  broken  up  into  N  equal  sized  intervals  for 
the  test.   The  additional  intervals  (-«>,  STPT),  and  [ZP,  +°°  ) 
will  be  used. 

GAM.lA*a**3**ENDP0INT*(N)  . 

*a*  is  the  degree's  of  freedom  parameter,  and  must  be  specified. 
*6*  is  the  density  parameter  and  may  either  be  specified  for 
a  type  1  test  or  left  out  for  a  type  2  test.   The  portion  of  the 
real  line  between  0  and  ENDPOINT  will  be  divided  into  N  equal  sized 
intervals  for  the  test.   One  additional  interval,  [ENDPT,  +«>  ) 
will  be  included. 


IV.   Restrictions 


A  maximum  of  50  input  variables  (each  variable  a  sample  to  be  tested) 
may  be  input  to  the  program.   Only  one  distribution  or  distribution  type 
(e.g.  NORMAL  with  variance  1  and  any  mean)  may  be  tested. 
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Y.   Examples 


/*1D 

//   EXEC 

//SYSIN 

FITfCARDC 


SOUP 
DD  * 

n. 

NOR*2.0***»*-10**ll|*(2U) 
END  P 
END  S 
DATA(1000,1)(21X,FU.1) 


This  program  will  test  the  hypothesis  that  the  sample  is  from  a  normal 
distribution  with  a  mean  of  2,  for  some  variance.   The  interval   [-10, lU) 
will  be  divided  into  2^4  pieces,  each  of  length  1,  for  the  test. 


/**ID 

/ /   EXEC   SOUP 

//SYSIN   DP   « 

FIT (CARDS). 

BIN(lO)*l/2*(2). 

END  P 

END  S 

DATA(600,1)(26X,F^ 


#END 
/* 

This  program  will  test  the  simple  (type  1)  hypothesis  that  the  sample  was 
from  a  B  (10, 1/2)  population.   Note  that  with  the  given  number  (600)  of 
observations  we  had  to  group  adjacent  pairs  of  points  into  the  same  interval 
to  insure  a  reasonable  expected  frequency  in  every  cell. 


THE  KOLMOGOROV-SMIRNOV  STATISTIC 

I.   General  Description 

The  program  computes  the  Kolmogorov-Smirnov    (K-S)      D      statistic. 


D  =      2^P 
all  X 


F^(x)    -   F(x) 


where  F   is  the  sample  cumulative  distribution  for  a  sample  of  size  N  and 
F(x)  is  the  specified  cumulative  distribution. 

II .   Theoretical  Discussion 

The  empirical  distribution  function  which  is  determined  by  the 


order  sample  x 


(1) 


.  .X 


(n)  ' 


0  for  X  <  X 


(1) 


F„(x)   = 


j/n  for  X/.v  <  X  <  X,.  ^v 
1  f°^  ^  !  ^(N) 


win  generally  differ  from  the  population  distribution  function.   If 
the  sample  distribution  differs  exceedingly  from  the  specified 
distribution  F(x),  the  amount  of  the  difference  might  be  of  use  in 
determining  whether  to  accept  the  hypothesized  distribution  as  correct. 
The  Kolmogorov-Smirnov  test  uses  the  maxmimum  actual  numerical 
difference  I  F  (x>  -  F(x)  I  . 

Example  (See  Lindgren  [U]): 

Consider  testing  the  hypothesis  that  a  distribution  is  normal  with 
mean  =  32  and  variance  =  3.2U  with  10  sample  observations 


31.0 
31. U 
33.3 
33. H 
33.5 

F   (x)   and  F(x)   are   sketched  below. 


33.7 
3h.U 

3U.9 
36.2 
37.0 


ju  y/ 


i L__X 


M^ 


VI.K-S.2 


III, 


D   =    .56  which    is   the  maximum   of    |f    (x)    -   F(x)|     . 

iJ 

At  the  .95  confidence  level  the  critical  value  D'  =  .U0925 

Since  D  >  D'  the  distrihut ion  being  tested  is  rejected  at  the  5^  level. 

*   See  Owen  [ 5 ] 

Notes 

Only  supply  the  parameters  needed  to  determine  a  specified  distri- 
bution. This  program,  at  this  time,  calculates  the  Kolmogorov-Smirnov 
statistic  for  the  following  distributions  if  given  the  proper  parameters 


A, 


Normal 

1 .  mean 

2.  variance 
Central  Chi-Square 
Noncentral  Chi-Square 

1.  degrees  of  freedom 

2.  noncentrality  parameters* 
Central  F 

1.  degrees  of  freedom  numerator 

2.  degrees  of  freedom  denominator 
Noncentral  F 

1.  degrees  of  freedom  numerator 

2.  degrees  of  freedom  denominator 

3.  noncentrality  parameter* 
Central  Beta 

1.  degrees  of  freedom  numerator 

2.  degrees  of  freedom  denominator 
Npncentral  Beta 

1.  degrees  of  freedom  numerator 

2.  degrees  of  freedom  denominator 

3.  noncentrality  parameter* 
Student's  t 

1.   degrees  of  freedom 
Gamma 


1. 
2. 


A 

B 


where  F(x)  = 


J 


,A 


t 


;a-i)  -(a/b) 


dt 


o'     r(A)B 
Exponential  -  special  case  of  Gamma 
1.^  A  =  1 

2.   B  =  1/L  where  L  =  rate  of  occurence 
Noncentral  T  -  transform  to  noncentral  F 

1.  degreesoof  freedom  numerator 

2.  degrees  of  freedom  denominator 

3.  noncentrality  parameter* 


Noncentrality  parameter  is  defined 
(See  Graybill  [3]  for  further 
development  of  A ) . 


N   , 
Z  y 
i=l 


Vi.K-o.3 


Sajnple  Programs 

A.  Normal   -  type   1 
2  =   32 

a  =  3.2U 
K-S(C)(l)      *32**3.2H*. 

B.  Central  Chi-Square  -  type  2 
d.f.  =  2k 

K-S(C)(2)****(214). 

C.  Noncentral  Chi-Square  -  type   2 
d.f.    =   10 

A    =    .25 
K-S(C)(2)****(10)()(.25). 

D.  Central  F  -  type  3 
d.f.  numerator  =  5 
d.f.  denominator  =  l6 

K-S(Sl)(3)****(5)(l6). 

E.  Noncentral  F  -  type  3 
d.f.  numerator  =  10 
d.f.  denominator  =  13 
noncentrality  parameter  (A)  =  2. 5 

K-S(S2)(3)****(10)(13)*2.5*. 

F.  Central  Beta  -  type  h 
d.f.  numerator   =  5 
d.f.  denominator  =  25 

K-S(S2)(l+)****(5)(25). 

G.  Noncentral  Beta  -  type  h 
d.f.  numerator   =   8 
d.f.  denominator  =  20 
noncentrality  parameter  =  2.8 

K-S(S1)(U)****(8)(20)*2.8*. 

H.   Student's  t  -  type  5 
d.f.  =  23 
K-S(C)(5)****(23). 

I.   Gamma  -  type  6 
A  =  10 
B  =  3 
K-S(S3)(6)*10**3*. 

J.   Exponential  -  type  6 
A  =  1 

B  =  1/.5  =  2  where  L  =  .5 
K-S(S2)(6)*1**2*. 


K.   Noncentral  t  -  type  3 

A  noncentral  t  distribution  can  he  transformed  into  a  non- 
central  F  distribution  by  squaring  the  values. 
Degrees  of  freedom  =  5 
Noncentrality  parameter  =  .25 


VI.K-S.U 


TRA ( C  )  . 

MUL(1)(1)(1). 

OUT ( SI )  ( 1 )  . 

END  P 

K-S(S1)(3)****(1)(5)*.25*. 


V.   Parameters 


The  following  parameters  follow  the  mnemonic  K-S : 
Parameter 


Number 
1 
2 
3 
1+ 

5 
6 
T 


Description 

Input  address,  CARDS,  SEQUENTIAL  1-15- 

Specified  distribution  (see  Section  IV) 

Floating  point  value  (Mean,  A). 

Floating  point  value  (Variance,  B). 

Defirrees  of  freedom  (nijmerator ) . 

Degrees  of  freedom  (denominator). 

Floating  point  value  of  non-centrality 
parameter. 

Output  address  of  x, . \    and  F(x. ) . 
SEQUENTIAL  1-15  and/or  PRINT. 


VI.   References 
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[3]   Graybill,  F.A.,  An  Introduction  to  Linear  Statistical  Models, 
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FACTOR  ANALYSIS  PACKAGE 


TTi^SSi 


BINORMAMIN 


I.  General  Description 

BINORMAMIN  ROTATION  rotates  a  matrix,  F,  of  orthogonal  factor  loadings 
to  oblique  simple  structure. 

It  does  this  by  iterating  for  T  in  FT  ^  A  (where  A  is  the  rotated  factor 
pattern)  so  as  to  minimize: 


Z 


^{v^./h^.)iV:Jh^.) 


K  =  ^  K  = 
P  P   P 


Z       q   rj^  JP^  J^  jq'  J 
q=P  ^  -Z  2  A  2^  ,Z  2  /,  2 


■] 


J  jp'  J  J  jq  J 


Since  solving  directly  for  K  is  too  complex,  BINORMAMIN  takes  one  vector  at 
a  time,  rotating  it  against  all  the  others,  to  minimize  each  Kp. 

Its  name  comes  from  the  fact  that  is  uses  a  double  (BI)  NORMAlization 
in  seeking  a  MINimum. 

For  further  information  see : 

1.  Kaiser,  H.  F.  and  Dickman,  K.  W.,  "Analytic  Determination  of 
Common  Factors".   Unpublished  manuscript.  University  of 
Illinois,  1959- 

2.  Harmon,  H.  H.,   Modern  Factor  Analysis  .   Chicago,  University 
of  Chicago  Press,  I960.   pp.326ff. 

II.   Restrictions 


III. 


Input  is  limited  to  matrices  of  I50  x  30  or  less. 
Parameters 


After  the  program  name,  BINORMAMIN,  are  the  following  parameters: 
Parameter 


Number 
1 
2 
3 

k 
5 
6 


Use  or  Meaning 

Input  Address  of  factor  matrix,  F. 

Output  Address  of  factor  matrix,  F. 

Output  Address  of  the  transformation 
matrix,  T. 

Output  Address  of  the  reference  vector 
structure,  V. 

Output  Address  of  the  correlations  between 
reference  vectors. 

Output  Address  of  the  primary  factor 
pattern,  P. 


VII. BIN. 2 


Parameter 
Number 


Use  or  Meaning 

Output  Address  of  the  correlations  between 
factors. 

Maximum  number  of  iterations  (see  note).  If 
blank,  the  maximum  will  be  set  at  100  itera- 
tions. 

Convergence  criterion  (see  note). 

A.  Defined  zero  change:  iterating  will  stop 
when  each  element  in  V  changes  by  less 
than  A.   (A  must  be  less  than  .2  and  no 
less  than  .0000001) . 


10 


B.   Defined  zero  rotation:   iterating  will 
stop  when  each  vector  in  T  changes  by 
less  than  6,    where  9   is  the  angle  whose 
cosine  is  B.   (B  must  be  less  than  1.0 
and  no  less  than  .2). 

If  left  blank,  A  will  be  set  to  .001. 

If  an  initial  T  is  to  be  read  in,  input 
address  of  T. 


IX  Output  Address  of  the  initial  T. 

Note  on  Output:   A.   Any  output  option  left  blank  will  not  be  output. 

B.  The  program  will  always  print  out  the  program 
name  and  the  number  of  iterations  actually  done. 

C.  The  program  will  print  out  the  largest  change  in 
V,  unless  option  9B  is  used. 

D.  All  data  printed  out  is  to  7  decimal  places. 

Note  on  parameters  8  and  9'      Program  v/ill  stop  at  v/hichever  criterion  it 
meets  first. 


Note  on  parameter  9:      This  parameter  is  a  floating  point  constant  and  there- 
fore Tust  be  enclosed  in  asterisks,  with  a  decimal  point,  as  in  example: 

Example  :   BINORMAIvIIN  (CARDS)  (  M  USEQ  1  /PRINT'  {'  {)  (PRINT)  ( )*  .0001* . 

Store  V  on  SEQ  1,  also  prints  Y  and  correlations  between  factors.  On  the  last 
iteration,  no  element  in  V  changed  by  more  than  .0001,  unless  the  maximum  of 
100  iterations  was  reached. 


CENTROID  FACTOR  ANALYSIS 


General  Description 

CENTROID  FACTOR  ANALYSIS  computes  a  set  of  f  linearly  independent 
vectors  (factors^  which  are  mutually  uncorrelated.  Normally,  a  factor 
analysis  decomposes  a  matrix  of  correlations,  R^,    into  a  set  of  f 
factors.   The  factors  are  arrayed  as  column  vectors  in  the  factor  matrix, 
F,  such  that 


^  =  ^^'  ^  ^n-f) 


-th 


where  Rn-f  is  the  matrix  of  residual  effects.   The  K   factor  is  computed 
by  dividing  the  column  sums  of  R  ,  by  the  square  root  of  the  total  sum  of 


elements  of  R 


n-k 


fi,k  =  Z  r.  .   ^  V  JlZ    v.     .  ^ 


k) 


Between  each  factor  extraction,  the  variables  in  the  residual  matrix 
are  successively  reflected  until  all  the  columan  sums  are  positive. 

For  more  detailed  discussion  see: 

1.  L.  L.  Thurstone,  Multiple  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  19^7<  PP-  1^+9-175  • 

2.  Harry  Harmon,  Modern  Factor  Analysis,  Chicago, 
University  of  Chicago  Press,  I96O,  pp.  192-215. 

II .   Restrictions 

The  input  matrix  for  the  CENTROID  program  must  not  exceed  the  dimen- 
sions of  190  X  190.   The  input  matrix  is  further  limited  to  being  a  square, 
positive  definite  or  semi-definite,  symmetric  matrix.   Commonly,  correlation, 
covariajice,  or  cross-product  matrices  are  used  as  input  data.   Any  attempt 
to  introduce  communality  estimates  (change  the  diagonal  elements)  must  be 
made  before  data  is  passed  to  the  CENTROID  program.   A  set  of  communalities, 
which  are  incorrectly  estimated,  can  make  the  matrix  non-positive  and  could 
conceivably  cause  a  hang-up. 

The  input  data  may  come  from  any  storage  medium  which  conforms  to  SOUPAC, 
Similarly,  the  output  codes  follow  the  established  conventions  and  are  at 
the  option  of  the  user. 


The  input  matrix  may  be  completely  factored  (i.e.,  N  factors  from  a 


N  variable  matrix"^  . 
criteria; 


However,  factoring  may  be  stopped  by  any  of  three 


The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  be  done.   Consequently,  it  is  advisable  to  put  the 
maximum  value  on  this  limit  in  cases  where  it  is  not  the 
primary  criterion.   (Set  it  equal  to  the  number  of  variables' 


VII.CEN.2 


2.  The  per  cent  of  total  variance  removed  from  the  R  matrix  is 
a  second  limiting  criterion.  This  parameter  also  specifies 
an  upper  limit  l:o  the  process.  Therefore,  it  should  be  set 
at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

3.  The  last  criterion  is  to  stop  when  the  factor  contribution  falls 
below  1.   The  use  of  this  procedure  is  dictated  by  the  presence 
or  absence  of  its  associated  parameter. 

If  all  three  criteria  are  used  simultaneoulsy,  factoring  will  be  stopped  by 

whatever  criterion  is  met  first. 


Til.   Parajneters 


Follov;ing  the  program  anine  on  the  program  call  card  come  the  parameters 
needed  by  the  program.   The  parameters  must  appear  in  the  order  below: 


Parameter 
NutTiber 

1 

2 

3 


Use  or  Meaning 
Input  Address. 
Output  Address. 


CARDS  or  SEQUENTIAL  1-15- 
SEQ,UENTIAL  1-15  and/or  PRINT. 


Maximum  number  of  factors  to  be  extracted. 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix. 

k  Per  cent  of  total  variance  to  be  removed 

expressed  as  an  integer  between  0  and  100. 

5  The  presence  of  any  number  greater  than  0 

in  this  parameter  indicates  that  factoring 
should  stop  when  the  factor  contribution 
falls  below  unity. 

6  Output  Address  of  Residual  Matrix. 

If  parameters  3  and  k   are  left  blank  then  by  default  option  they  will  be 
set  to  maximum  possible  values  and  a  message  will  be  printed. 

Residual  Matrix  must  be  stored  before  it  can  be  printed. 

Example:  Assume  that  you  have  77  variables  ajid  that  the  correlation  matrix 
is  stored  on  SEQ 1  ,  then  legal  forms  of  CENTROID  call  statement  may  be: 

CENTROID(SEQ  1) (PRINT) (77) (lOO) (l^ . 

CENTROID   (SEQ  1)(P(F))(50)(80). 

CENTROID  (SEQ  1)(SEQ  2/P) (20) (lOO) (l) (SEQ  3/P) • 

CENTROID  (SEQ  1)  (SEQ  2/P(F)Ml5)  (90)0  (SEQ  3/P(F)). 

CENTROID  (SEQ  1) (P) .     In  this  case,  number  of  factors  =  77 

and  per  cent  of  variance  =  100  v;-ill  be  assumed  by  default. 


vMs^isi: 


COMMUNALITY  ESTIMATION 


General  Description 

Five  methods  of  COMMUNALITY  ESTIMATION  are  offered  in  this  program, 
In  each  case  the  estimates  replace  the  diagonal  elements  of  the  matrix. 
They  are  as  follows : 


Code  Number 


Method 


The  element  of  largest  absolute  magnitude  in 
each  row  replaces  the  diagonal  element  of  the 
row. 

The  square  of  the  multiple  R  of  each  variable 

with  all  others  replaces  the  diagonal  entry 

for  that  variable.  (See  Special  Comment  Number  2) 

Communalities  produced  from  another  analysis 
and  are  to  be  input  from  cards  or  another 
storage  medium. 


For  each  row  (n) 


N 
iZ     : 


i.J 


■)/N) 


1/2 


replace  the  diagonal  entry  for  that  row. 
This  is  the  square  root  of  the  average 
square  across  the  row. 

For  each  row  (N)    (r*  )(S.  -  r*  )/(S,  -  r*  ) 

ik   1    ik  '^   k    ik 

replaces  the  diagonal  entry  for  that  row 
where : 

rf ,  =  max  ab  s  ( r .  . )  and 
Ik  ij' 

S.  =  Z  abs  (r..),  S   =  Z  abs  (r  .) 
1   .      '  ij''   k    .      '  kj' 

J  J 

This  method  of  COMMUNALITY  ESTIMATION  is  due  to 
Professor  L.  Tucker. 


II.   Restrictions 

Input  is  restricted  to  correlation  matrices  of  order  150  or  less. 
[II.   Parameters 


The  parameters  for  the  COMMUNALITY  ESTIMATION  program  appear  on  the 
program  call  card.   They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15, 


VII. COM. 2 


Parameter 
Number 

2 

3 
h 


IV.   Special  Comments 


Use  or  Meaning 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 
Section  Code  Number.   (See  General  Description), 
Input  Address  if  Option  3  is  used. 


(1)  If  the  correlation  matrix  and  communalities  both  are  input  from  cards, 
the  correlation  matrix  precedes  the  communality  estimations.   (See  Code 
Number  3  in  the  General  Description^ . 

(2)  If  the  input  correlation  matrix  is  sing^ilar,  or  very  nearly  so, 
squared  multiple  correlations  computed  by  standard  procedures  may  be  subje^ 
to  considerable  error,  and  will  usually  exceed  unity  for  several  variables 
For  this  reason,  the  user  should  be  aware  of  the  characteristics  of  the 
matrix.   The  program  will  check  to  see  that  all  R~  are  less  than  or  equal 
to  1.0.   If  this  is  not  the  case,  execution  will  cease  and  a  message  will 
be  printed  stating  that  the  correlation  matrix  is  virtually  singular. 
There  is  an  alternative  procedure  devised  by  Ledyard  Tucker  for  finding 

R  in  singular  matrices.   Information  is  available  in  the  SOUPAC  office.  . 


V.   Reference 

Harmon,  Harry:   Modern  Factor  Analysis,  Chicago,  University  of  Chicago 
Press,  i960,  pp.  83-90. 


ITERATIVE  FACTOR  ANAI.YSIS 


General  Description 

A.   ProceduraJ. 

This  routine^  upon  option,  provides  one  of  four  iterative  factor- 
ization methods: 

1.  Alpha  factor  analysis  (AFA,  Kaiser,  I962) 

2.  Canonical  factor  analysis  (CFA,  Rao,  1955,  Harris,  19^2) 

3-   Stepwise  maximuin  likelihood  factor  analysis  (MLFA,  Lawley,  19^0) 
k.      Iterative  principal  axis  factor  solution  (IPRAX,  Traditional) 

All  four  methods  have  in  common  that  communalities  and  factor  loadings 
are  estimated  simultaneously.   In  three  cases  (AFA,  CFA,  IPRAX)  the 
number  of  factors  decision  can  be  made  beforehand  by  the  user,  or  it 
can  be  left  to  the  program,  in  which  case  appropriate  modifications  of 
Guttman's  lower  bound  criterion  will  be  used. 

The  four  methods  differ  from  each  other  in  theory  with  respect  to  the 
defining  criterion  of  optimization,  and  consequently  they  differ  tech- 
nically with  respect  to  the  matrix  that  is  diagonalized  in  each  case. 

1.   AFA  (Kaiser) 

Optimization  criterion:   maximize  the  alpha- reliabilities 
(Cronbach)  of  the  retained  factors.   If  the  number  of  factors 
decision  is  left  to  the  program,  the  Kaiser  modification  of 
the  Guttman  criterion  will  be  used  and  all  factors  with 
positive  alpha-reliability  will  be  iterated  upon. 

The  diagonalization  is  on  the  matrix  C  in 

C  =  H"-^  (R  -  U^)  H-1    so  that  C  =  Q0Q' 

where  R  is  an  nxn  input  matrix  of  covariances,  H  is  a  diagonal 
matrix  of  communalities,  U^  =  I  -  H^  is  a  diagonal  matrix  of 
uniquenesses,  Q  is  an  nxm  matrix  of  latent  vectors  corresponding 
to  the  m   largest  latent  roots  in  9   which  are  used  to  recompute 

^  -I  /o  p 

new  estimates  of  H^  through  F  =  H  Q  0-^/^.   An  initial  set  of  H 
is  provided  by  I  -  (diag(R~-^) )"-'-  which  is  equivalent  to  the 
squared  multiple  correlations  of  R  if  R  itself  is  a  correlation 
matrix. 


Invariance  under  scaling:   Kaiser  has  shown  that  the  resulting 
factors  will  be  invariant  under  scaling,  i.e.,  if  a  covariance 
matrix  R  gives  rise  to  a  factor  matrix  F  then  the  covariance  matrix 
SRS  will  give  rise  to  a  factor  matrix  SF  (S  diagonal). 

Behavior  of  latent  roots,  alpah  reliabilities:   the  n-m  rejected 
roots  of  C  add  to  zero  at  each  state  (i.e.,  C  is  non-Gramian), 
the  m  accepted  roots  are  simple  functions  of  the  alpha-reliabilities 
of  the  retained  factors  in  F.   These  reliabilities  will  be  output 
by  this  sub -program, . 


VII.  ITE.: 


2.   CFA  (Rao,  Harris^ 

Optimization  criterion:   maximize  the  correlations  betv/een  m 
linear  combination  of  the  common  parts  of  the  variables  with  m 
linear  factors  that  are  canonically  correlated  (Hotelling)  with 
the  variables  in  the  common  factor  space.   If  the  number  of 
factors  decision  is  left  to  the  program,  the  Harris  modification 
of  the  Guttman  criterion  will  be  used,  leading  to  a  Gramian 
R-U  of  minimum  rank . 


The  diagonali:7at1  on  is  on  the  matrix 

U''^^  U'-'-    so  that  C  =  Q0Q' 


U 


-1 


(K 


where  F  -  UQfA/''  is  used  to  recompute  new  estimates  for  U^, 
retaining  the  m  largest  roots  of  C  in  0-      The  notation  is  the 
same  as  in  section  1  Cafa^.   An  initial  set  of  U^  is  provided  by 
[diag  (R-ll]-l. 

Invariance  under  scaling:   the  resulting  factors  are  again  invariant 
under  scaling  as  defined  in  section  1  (AJ'A'. 

Behavior  of  latent  roots:   Chi- square  criterion:  Rao  has  shovm 
that  the  n-m  rejected  roots  approach  unit  at  convergence.   For 
exact  raiik  m  data  they  will  be  "exactly"  unity  within  the  tol- 
erance of  the  convergence  criterion  ETA  (see  section  III  -  B). 
P'or  data  containing  random  error  their  departure  from  unity 
provides  a  likelihood  ratio  test  for  the  hypothesis  that  the 


population  matrix  P-V  -^  GG'  ,  where  P, 


G  are  population 


parameters  corresponding  to  R,  U,  F  in  the  sample,  is  rank  m 
or  less.   A  criterion  for  this  test  is  computed  by  this  sub- 
program, which  can  be  compared  with  two  chi-square  approximations 
which  are  also  output  by  this  sub-program.   Note  that  such  a 
chi-square  test  is  valid  only  if  the  iterative  process  has  indeed 
converged,  as  indicated  by  the  maximal  discrepancy  between  trial 
vectors  which  is  printed  out  for  that  purpose. 

3.  MLFA  (Lawley^ 

The  CFA  variant  of  the  program  can  be  used  for  a  step-wise  maximiuxn 
likelihood  factorization  in  the  Lawley-Rao  sense. 


Optimization  criterion:   maximize  the  likeliliood  function  corre- 
spotiding  to  the  multivariate  normal  distribution  with  covariance 
matrix  parar.etors  P -GG '  -!-  V^  (as  defined  in  section  2,  CFA),  eiven 
the  sample  matrix  R,  under  choice  of  G  and  ')l   and  observing  the 
side-conditions  that  P -V*^  is  Gramian  and  V  diagonal  withO<Vj_<l 

Hie  diagonalir.ation  is  the  same  as  in  CFA,  section  2,  hence,  the 
resulting  factors  are  again  invariant  under  scaling  as  defined  in 

section  1  (AFA) . 


VII.ITE.3 


In  contrast  to  CFA,  however,  the  number  of  factors  decision  is 
made  on  statistical  ground.   The  user  would  start  with  a  reason- 
able guess  for  m  (preferably  m  < n/P  to  ensure  positive  degrees 
of  freedom  for  the  chi-square  test,  which  otherwise  will  be  by- 
passed) .   After  convergence  has  been  obtained,  the  user  would 
insepct  the  chi-squared  statistic.   If  the  statistic  is  below  the 
table  values  of  the  chosen  porbability  level  (.05  or  .01),  then 
the  hypothesis  can  be  accepted  at  this  level  with  corresponding 
risk  and  the  user  has  the  option  to  reduce  m  for  a  second  run, 
etc.   On  the  other  hand,  if  the  adjusted  statistic  exceeds  the 
table  value,  then  m  must  be  raised  until  the  adjusted  statistic 
warrants  acceptance  of  the  hypotheses. 

Within  the  package  the  user  is  free  to  re-enter  the  routine 
repeatedly  with  sequentially  de-  or  increasing  m  specified  on 
the  call  card.   Since  the  test  assumes  convergence,  it  is  per- 
tinent that  the  number  of  iterations  be  allowed  large  enough 
for  convergence  to  occur  within  the  chosen  tolerance  bound  ETA 
(see  section  III  -  B) . 

k.      IPRAX  (traditional^ 

Optimization  criterion:   none 

The  diagonalization  is  on  the  matrix 

2 


C  =  R  -  U 


so  that  C  =  Q0Q' 


where  F  =  Q.sV'^^'  is  used  to  recompute  H  =  I  -  U^,  retaining 
the  m  largest  roots  in  P.      The  notation  is  the  same  as  in 
section  1  (AFA) .   An  initial  set  of  H^  is  provided  by  the 
identity  matrix.   If  the  number  of  factors  decision  is  left 
to  the  program,  the  unmodified  Guttman  criterion  will  be  used, 
i.e.,  all  factors  corresponding  to  roots  of  the  input  matrix 
R  which  exceed  unity  will  be  retained. 

Invariance  under  scaling:   as  defined  in  section  1  (AFA)  is 
not  obtained  by  this  method. 

The  behavior  of  the  latent  roots  is  not  known  at  present.   No 
statistical  or  other  significance  can  be  attached  to  the  m 
largest  or  n-m  smallest  root  of  C. 


5-   Both  covariance  matrices  and  correlations  matrices  are  accept- 
able as  input.   If  covariances  are  used  the  tenth  parameter 
should  be  1.   In  this  case  the  covariance  matrix  is  scaled  into 
a  correlation  matrix,  and  all  computations,  in  particular  the 
number  of  factors  decision,  are  based  on  this  correlation  matrix. 
At  the  final  stage  the  factors  are  scaled  back  so  as  to  account 
for  the  covariance  matrix  which  was  input.   The  matrix  of  residuals 
is  computed  in  the  metric  of  the  covariances. 
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Restrictions 


input  is  restricted  to  matrices  of  order  100  x  100  or  less.  Up  to 
50  factors  can  be  handled  by  this  program.   If  the  number  of  factors 
decision  is  left  to  the  program  and  more  than  50  factors  are  estimated, 
an  appropriate  message  will  be  printed  out  and  control  will  be  returned 
to  thQ  system. 


III.   Parameters 


Tlie  program  name  is  ITERATIVE  FACTOR  ANALYSIS.   After  the  name  on 
the  call  card  the  parameters  must  appear  in  the  following  order: 


Parameter 
Number 


Use  or  Meaning 

Input  Address  of  correlation  or  covariance 
matrix.   CARDS  or  SEQUENTIAL  1-15- 

Output  Address  of  correlation  or  covariance 
matrix.   SEQUENTIAL  1-15  and/or  PRINT. 


Output  Address  for  principal  axis  factors. 
SEQUENTIAL  1-15  and/or  PRINT. 

Output  Address  of  residual  matrix.   SEQUENTIAL  1-5 
and/or  PRINT. 


Option  code 


0  if  IPRAX 

1  if  ALPHA 

2  if  CANONICAL 


3   if   STEP-WISE  MAXIMUM  LIKELIHOOD 
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Parameter 
Number 


Use  or  Meaning 

Maximum  nujnber  of  cycles  to  be  executed. 
If  left  blank,  50  cycles  will  be  used  as 
upper  limit. 

Number  of  factors  to  be  extracted.   If  left 
blank,  all  factors  with  roots  exceeding  unity 
will  be  retained. 
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Exponent  of  convergence,  n,  where  tolerance 
ETA  =  10"".   If  left  blank  n  =  3  or  ETA  =  10-3. 
If  all  goes  well,  the  program  will  stop  as  soon 
as  either  one  of  the  stopping  criteria  is  met. 
Error  stops,  if  they  occur,  are  labelled 
accordingly. 

Sample  size  (for  CFA  only).   If  left  blank, 
the  chi-square  computations  are  by-passed. 
If  specified,  chi-square  is  computed  with  the 
sample  size. 

1  if  input  matrix  was  a  covariance  matrix. 


Output  common  to  all  four  sub -programs 

1.  Matrix  output  within  system  conventions: 

a.  R  (input  covariance  matrix^ 

b.  F  (factor  matrix) 

c.  R-FF'  (residual  matrix) 

2.  Vector  output,  print  only: 

a.  communality  vector  (last  iteration) 

b.  vector  of  latent  roots  of  C  (last  iteration) 

3.  Constant,  print  only: 

a.  number  of  iterations  completed 

b.  largest  discrepancy  between  trial  vectors 
(H^,  U~  ,  H"-*-,  depending  on  sub-program'' 

c.  root  mean  square  of  off -diagonal  residual  matrix 

d.  per  cent  of  variance  removed 

Additional  output  specific  to  sub-programs 

AFA:   The  alpha-reliabilities  of  the  m  retained  factors 

CFA:   chi-square  statistic,  chi-square  approximations  (Wilson, 
Hilferty)  for  p  =  .05  and  p  =  .01,  for  comparison  with 
statistic.   Degrees  of  freedom. 
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IV.   Special  Comments 


The  accuracy  should  be  approximately  6  digits  in  computations, 
possibly  somewhat  lower  for  a  very  large  number  of  iterations.   The 
effective  accuracy  depends  on  the  chosen  tolerance  ETA  and  the  actual 
convergence  as  indicated  by  the  largest  discrepancy  between  trial 
vectors.   The  chi- square  approximations  are  v;ithin  2  x  10"^  for  more 
than  8  degrees  of  freedom. 


JACOBI 


I.   General  Description 

This  program  calculates  eigenvalues  and  eigenvectors  of  a  square, 
symmetric  matrix,  using  the  JACOBI  rotating  technique.   This  program  is 
limited  to  matrices  of  110  rows  and  110  columns.   The  user  should  realize 
that  this  technique  is  extremely  slow  on  large  matrices,  while  the  program 
takes  no  longer  than  PRINCIPAL  AXIS  FACTOR  AI^ALYSIS  for  small  matrices 
(up  to  20  X  20). 

II.  Parameters 


Parameter 
Number 


Use  or  Meaning 


Input  Address  of  correlation  matrix, 
or  SEQUENTIAL  1-15. 


CARDS 


Output  Address  of  eigenvectors. 

Output  Address  of  principal  axis  factor. 

Output  Address  of  eigenvalues,  stored  as  a  row 
vector.   PRINT  is  not  valid.   The  eigenvalues 
are  always  printed. 

Number  of  eigenvectors  (or  factors')  to  be  out- 
put. 


III.   Special  Comments 

The  eigenvalues  are  stored  in  descending  algebraic  order  (from  largest 
to  smallest),  and  the  eigenvectors  and  factors  are  placed  in  the  same  order. 


IV.   Reference 

Ralston,  A.  and  Wilf ,  H.  S. :   Mathematical  Methods  for  Digital  Computers, 
John  Wiley  and  Sons,  New  York,  I96U. 
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OBLIMAX  ROTATION 


I .   General  Description 

The  OBLIMAX  OBLIQUE  ROTATION  transforms  a  set  of  factors  F  to  a  new 
set  V  such  that  the  factor  kurtosis, 

k 


ZL  V. 


K  = 


ij 


(ZE  V. .  ) 
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i  =  1,  2, 

J  =  1,  ?, 


,  k 


is  at  a  maximum, 


The  purpose  of  the  transformation  is  to  attempt  to  rotate  analytically 
to  a  position  similar  to  that  obtained  by  applying  Thurstone's  rules  for 
simple  structure.   (See  Multiple  Fac^tor  Analysis,  L.  L.  Thur stone,  19^7, 
pp.  319-^10.) 
not  the  same, 
both  procedures  will  agree  exactly. 


However,  Thurstone's  rules  and  the  oblimax  procedure  are 
.,  and  it  is  too  much  to  expect  that  results  obtained  from 


It  would  be  desirable  to  solve  directly  for  the  transformation  matrix 
T,  but  unfortunately  no  solution  to  this  problem  has  been  found.   Instead 
oblimax  takes  two  vectors  at  a  time,  solves  for  the  rotational  angles, 
transforms  the  vectors,  and  then  selects  another  pair  until  all  k(k-l) 
pairs  have  been  rotated.   This  process  is  repeated  iteratively  until  the 
criterion  K  no  longer  increases.   Despite  the  pairwise  procedure,  K  is 
well  behaved,  and  in  general,  approaches  steadily  to  a  minimum. 


For  any  pair  of  factors,  a  and  b,  the  solution  proceeds  as  follows: 


K 


ZZisi.    cos  0.  +  b.  sin  0. ) 
1     IJ 1     ^J 


h 


EZ:(a.  +  b.  X.) 
1    1  J 


h 


2n2 


[ZZ(a.  cos  0.  +  b.  sin  0.)  ]     [zz(a.  +  b.  X.)"] 


The  derivative  of  Kg^-b  is  set  equal  to  zero,  resulting  in  a  quartic 
equation  in  X  which  is  tan  0.   Two  solutions  for  X  will  be  maxima,  and, 
each  X  is  found,  the  sign  of  the  second  derivation  is  inspected  to  select 
maxima.   A  small  transform  (2  x  2)  is  created,  but  before  post-multiplication 
is  performed,  the  transforms  must  be  adjusted  so  that  when  it  becomes  a  part 
of  T,  t  ,  and  t^^,  will  remain  normalized.   In  this  way,  both  B  and  T  are 
developed  pair  by  pair. 


For  references  see: 

Pinzka,  C,  and  Saunders,  D.  R.,  "Analytic  Rotation  to 
Simple  Structure  II:   Extension  to  an  Oblique  Solution." 
Research  Bulletin  RB-5^-31'   Princeton,  N.  J. :   Educational 
Testing  Service,  195^- 


Alternate  Use 

If  the  user  already  has  a  transformation  matrix,  he  may  use  it  to  com- 
pute V-^g  et.al  by  giving  the  input  address  of  T  in  parameter  8;  in  this  case. 


^i^mm 
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the  oblimax  calculation  of  T  will  be  skipped.   If  both  F  and  T  are  to  be 
input  from  cards,  then  the  data  deck  of  F  should  precede  the  deck  of  T. 


III.   Output 


The  OBLIMAX  program  always  prints  the  following  (unless  parameter  8 
is  used) : 

1.  The  value  of  K  for  each  pass 

2.  The  iteration  time 

It  outputs  the  following  on  demand  (See  Parameters) : 

1.  Transformation  matrix  T 

2.  Reference  vector  structure,  V^^  -  FT 

3.  Reference  vector  correlations,  C^,g  =  T'T 

\\.      Diagonal  of  D  and  of  D"-"- 

where  D  is  the  diagonal  matrix  of  the  reciprocal 
square  root  of  the  diagonal  eler-ents  of  C  "-^ 


-1 


'rs 


5.  Primary  factor  pattern,  V^p  =  FTD 

6.  Primary  factor  correlations,  C|»p  =  DC  ^"  D 

All  data  is  printed  out  to  seven  decimaJ.  places. 

Restrictions 

The  number  of  variables  plus  the  number  of  factors  must  be  no  more 
than  300. 


V.   Note  on  Parameter 


IV. 


If  row  normalization  is  specified,  the  normalization  constamts  will  be 
preserved  and  the  rows  will  be  rescaled  to  proper  length  after  rotation  anc 
prior  to  output. 

VI .   Parameters 

The  program  name,  0BLI^4AX,  appears  first  on  the  program,  call  card 
and  is  followed  by  the  following  parameters.   Any  output  option  (except 
parameter  9)  may  be  SEQUENTIAL  1-15  and/or  PRINT;  it  may  be  left  blank 
if  not  desired. 


Parameter 

Number 


Use  or  Meaning 

Input  Address  of  F.   CARDS  or  SEQUENTIAL  1-15 • 
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Parameter 
Number 


3 
h 
5 
6 

7 
8 

9 


Use  or  Meaning 

If  rows  are  to  be  normalized  before  rotation, 
punch  a  1;  otherwise  a  zero  or  leave  blank. 
(See  Note  on  Parameter  2) . 

Output  Address  of  T. 

Output  Address  of  V 


Output  Address  of  C 


rs 


rs 


Output  Address  of  V-p^ 

Output  Address  of  C„ 

Input  Address  of  T  (See  Alternate  Use) . 

D- value  and  Inverse  of  D.   PRINT  only. 


ORTHOGONAL  PROCRUSTES;  KAISER'S  TECHNIQUE 
FOR  RELATING  FACTORS  BASED  ON  DIFFERENT  SAMPLES 


General  Description 

This  program  offers  2  options. 

1.  Orthogonal  Procrustes.   Given  2  matrices,  a  factor  matrix  A  and  a 
target  matrix  B,  the  program  solves 

AT  =  B  +  E 

for  T  in  a  least  squares  sense  (i.e.,  minimizing  tr(E'E),  under  the 
restriction  that  the  transformation  matrix  T  by  orthonormal ,  i.e. 

m  I  m  _  mm  I   _   y 

There  are  no  restrictions  on  A  and  B  other  than  conformability.   In 
particular,  the  method  does  NOT  require  full  column  rank  in  either  of 
these  input  matrices. 

For  further  information  and  some  applications,  see 

Schonemann,  P.  H.,  "A  Solution  of  the  Orthogonal  Procrustes 
Problem  with  Applications  to  Orthogonal  and  Oblique  Rotation," 
Unpublished  Ph.D.  thesis,  I96U,  (on  file). 

2.  Kaiser's  Technique  for  Relating  Factors  Based  on  Different  Samples. 
Given  2  factor  studies  based  on  different  samples  but  overlapping  in 
some  (not  necessarily  all)  variables,  this  technique  yields,  under 
certain  mild  (non-singularity)  conditions  and  certain  strong  assump- 
tions, a  matrix  of  estimated  cosines  between  the  two  sets  of  factors 
which  might  be  interiDreted  as  correlations   in  a  very  loose  sense 
("quasi  correlations").   Operationally 


R 


12 


=  R 


Ic   cc 


c'2 


where  R   and  R  ,   give  the  correlations  of  orthogonal  reference  axes 
{such  as  centroid  or  principal  axis  factors)  with  primary  factors  in 
each  study,    ^cc'  ^^  ^^^   transformation  matrix  obtained  when 
fitting  one  set  of  tests  to  the  other  by  the  Orthogonal  Procrustes 
technique  above,  and       R]_2  is  the  matrix  of  quasi  correlations 
relating  the  factors  of  one  study  to  those  in  the  other.   This  matrix 
need  not  be  square.   The  program,  as  written,  assumes  centroid  (or, 
equivalently ,  principal  axis)  matrices  and  reference  vector  structures 
as  input   and  proceeds  to  compute  R^,-^  and  R^'p  f^o^i  those,  assiiming 
this  to  be  most  convenient  for  most  users.   Note  that  the  essential 
information  is  embodied  in  the  matrix  'R^^y   which  is  identical  with 
the  matrix  T  of  the  orthogonal  Procrustes  routine,  so  that  the  remain- 
ing algebra  can  also  be  performed  by  other  programs  in  SOUPAC ,  if 
more  convenient. 
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For  further  information,  see: 

Kaiser,  H.  F.   "Relating  Factors  Between  Studies  Based  Upon 
Different  Individuals."   Unpublished  MS.,  U.  of  I.,  I96O. 


II.   Restrictions 


III. 


Input  is  restricted  to  matrices  of  order  100  x  50  or  less. 
Parameters 


The  program  mnemonic,  ORT,  appears  first  on  the  card  and  is  followed  by 
up  to  13  parameters 


Parameter 
Number 

Description 

1 

Input  Address 
CARDS  or  SI-SI5 

2 

Input  Address 
CARDS  or  SI-SI5 

3 

Output  Address 
SI-SI5  and/or 
PRINT 

k 

Output  Address 
SI-SI5  and/or 
PRINT 

5 

Output  address 
SI-SI5  and/or 
PRINT 

6 

Output  address 
SI-SI5  and/or 
PRINT 

7 

Output  Address 
SI-SI5  and/or 
PRINT 

Orthogonal 
Procrustes 


Kaiser ' s 
Factor  Matching 


AT 


1  tc 


2''tc 


l\= 


2\c 


Input  Address 
CARDS  or  S1-S15 


(reference  vector 
structure  for  study  l) 


1.  tn 


Input  address 
CARDS  or  S1-S15 


R   (reference  vector 
structure  for  study  2) 


10 


Output  Address 
SI-SI5  and/or  PRINT; 
Blank  if  not  desired 


1   tn 


11 


Output  address 
SI-SI5  and/or  PRINT; 
Blank  if  not  desired 


2^tn 
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Parameter 
Number 


12 


Description 


Output  address 
S1-S15  and/or 
PRINT. 


Orthogonal 
Procrustes 


Kaiser ' s 
Factor  Matching 

R    ("quasi  correla- 
tions  j 


13 


Input  address 
integer  k. 


of 


needed  only  if  there 
are  m  factors  in  study 
1  and  k  <  m  factors  in 
study  2. 


Note:   If  parameter  8  is  empty,  an  Orthogonal  Procrustes  Solution  will 
"be  computed  and  control  will  then  be  returned  to  the  system.   If  an  input 
address  for  iR-^n'  ^^^  reference  vector  structure  of  the  first  study,  is  given, 
it  will  function  as  a  switch  and  invoke  the  subroutine  for  Kaiser's  technique. 
In  this  case,  the  reference  vector  structures  should  be  in  the  same  order  as 
the  centroids.   All  h   matrices  should  contain  only  variables  which  were  common 
for  both  studies. 


Rectangular  R-ip'  '^^   ^^^  number  of  factors  were  different  in  both  studies, 
say  m  >  k,  then  the  matrices  with  m  factors  should  each  precede  the  corres- 
ponding matrix  with  k  factors.   For  brevity,  let  this  input  sequence  be  denoted 
by   Fjj^,  F,  ,  Vjjj,  Vj^.   Fjj^,  V^^,  and  F-^   should  be  read  in  with  a  format  for  m 
columns,  e.g.  its  data  card  might  read  "DATA(m)  (mF5.3)"  so  that  its  machine 
image  will  be  augmented  by  m-k  coliimns  of  zeros  (so  as  to  allow  for  an  Ortho- 
gonal Procrustes  fit  of  the  centroids).   On  the  other  hand,  V   is  to  be  read 
with  a  format  signaling  k  columns,  e.g.  its  data  card  might  read  "DATA(k) 
(kF5.3)"  (so  as  to  allow  for  inversion  of  V ' V  ) .   Finally,  the  13th  parameter 
should  be  k  in  this  case. 


SCATTER  PLOTS 


I.   General  Description 


This  program  generates  2  different  types  of  one-page  plots  depending 
on  the  type  of  data  input.   If  only  one  row  of  data  is  input,  the  program 
generates  a  successive  value  plot.   That  is,  a  single  row  of  data  will  be 
interpreted  as  representing  successive  values  of  a  single  variable  and 
will  be  plotted  on  the  vertical  axis,  while  the  column  number  (value 
number)  is  plotted  on  the  horizontal  axis.   For  example,  this  feature  is 
especially  suitable  for  plotting  a  row  of  eigenvalues  output  from  a  factor 
analysis  program. 

If  more  than  one  row  of  data  is  input,  the  program  generates  bivariate 
scatter -plots  ,  assuming  that  columns  represent  variables.   The  program  will 
generate  a  separate  one-page  scatter-plot   for  each  possible  pair  of  variables, 
up  to  a  limit  specified  by  the  user.   The  user  cannot  specify  particular 
pairs  to  be  plotted.   The  first  variable  of  each  pair  is  plotted  on  the 
horizontal  axis  and  the  second  is  plotted  on  the  vertical  axis. 


II.   Options 

There  are  four  options  available  in  this  program. 


l)   Bounds  on  axes :   The  user  may  specify  upper  and  lower  bounds 

for  either  or  both  axes.   If  the  user  does  not  specify  these  limits 
the  program  will  generate   its  own  bounds  for  the  axes.   Any 
point  lying  beyond  user-specified  bounds  will  be  omitted  from 
the  plot . 


2/   Printed  Axes :   The  user  may  specify  whether  he  wants  printed  axes 
included  in  a  bivariate  scatter-plot  .   Since  these  axes  must  be 
printed  at  the  center  of  the  graph,  they  should  probably  be 
omitted  if  the  origin  of  the  plot  to  be  generated  will  not  also 
be  centered  on  the  graph. 

3)   Point  Identification:   The  user  may  specify  whether  points  in  the 
plot  should  be  individually  identified  or  plotted  as  x's.   There 
are  60  identification  characters  available;  if  the  user  requests 
identified  points,  the  first  60  points  will  be  identified  as 
follows : 
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Point 


POINT  IDENTIFICATION  TABLE 


ID 


Point 


ID 


Point 


ID 


Point 


ID 


1 

A 

16 

P 

31 

5 

k6 

K 

2 

B 

IT 

Q 

32 
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hi 
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3 

C 

18 
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33 

7 

U8 

M 

h 

D 

19 

S 

3U 

8 

i^9 

N 

5 

E 

20 

T 

35 

9 

50 

0 

6 

F 

21 

U 

36 

A 

51 

P 

T 

G 

22 

V 

37 

B 

52 

Q 

8 

H 

23 

W 

38 

C 

53 

R 

9 

"I 

2U 

X 

39 

D 

5h 

S 

10 

J 

25 

Y 

UO 

E 

55 
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Obviously",  if  there  are  more  than  35  points  on  a  given  plot, 
there  will  be  some  duplicate  identification  characters.   It 
remains  for  the  user  to  differentiate  these  points  by  virtue  of 
their  location  on  the  plot . 

If  the  identification  is  requested,  the  program  will  plot 
points  6I-85  as  asterisks,  but  points  beyond  #85  will  be  omitted 
entirely.   In  general,  if  a  user  wishes  to  plot  more  than  60 
points,  it  is  not  recommended  that  he  specify  identification 
of  points. 

If  the  user  specifies  counted  points,  the  program  will  plot 
an  X  at  the  location  of  each  point.   If  there  are  2  or  more  points 
at  a  specific  location,  the  program  will  plot  a  number  (2,  3,  •••,  I 
indicating  how  many  points  are  present  at  the  location.   If 
more  than  9  points  exist  at  a  specific  location,  the  program 
will  plot  the  letter  M  at  that  location. 

h)      Variables  to  be_  plotted:   The  user  may  specify  the  number  of 

variables  for  which  he  wants  pairwise  scatter  plots.   The  program 
will  generate  bivariate  plots  for  all  possible  pairs  of  the 
set  of  variables  specified.   For  example,  if  the  user  inputs  a 
matrix  of  10  eigenvectors  and  sets  this  parameter  (#2)  equal  to 
h,   the  program  will  generate  plots  for  eigenvectors  1  vs  2,  1  vs  3, 
1  vs  U,  2  vs  3,  2  vs  h,    and  3  vs  U. 


Ill, 


Input 


Input  must  be  either  (l)  a  single  row  of  data  to  be  converted  to  a 
successive  value  plot,  or  (2)  a  data-matrix  to  be  converted  to  bivariate 
scatter -plots . 
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IV.   Special  Comment 

Due  to  the  various  problems  involved  in  plotting  points  on  standard 
computer  output  paper,  these  scatter -plots  will  "be  of  limited  accuracy. 
The  most  obvious  problem  is  that  there  are  a  finite  number  of  locations  at 
which  a  given  character  can  be  printed,  resulting  in  slight  misplacement  of 
many  points.   Thus,  though  the  plots  will  present  a  valid  and  useful 
picture  of  the  general  relationships  present  in  the  data,  the  user  should 
be  cautious  in  any  technical  or  detailed  analysis  of  these  graphs. 


V, 


Parameters 


The  program  mnemonic  (PLO)  appears  first  on  the  program  card  and  is 
followed  by  up  to  7  parameters;  only  the  first  parameter  is  required. 
Parameter  1  is  an  address;  parameters  2  and  3  are  Integers  enclosed  in 
parentheses;  parameters  U-7  are  floating  point  niimbers  enclosed  in 
asterisks.   Parameters  h   and  5  should  be  either  both  specified  or  both 
blank.   Also,  parameters  6  and  7  should  be  either  both  specified  or 
both  blank. 


Parameter 
Number 


Use  or  Meaning 

Input  address  (cards  or  S1-S5) 

Number  of  variables  to  be  plotted.  (The 
value  of  this  parameter  is  irrelevant  if 
input  is  a  single  row.   If  this  parameter 
is  omitted  in  a  bivariate  scatter-plot 
program,  all  possible  pairs  of  variables 
will  be  plotted.) 

Code  for  point  identification  and  inclusion 

of  axes; 

(-1)   axes  omitted;  points  counted 

(-2)   axes  omitted;  points  identified 

(1)  axes  included;  points  counted 

(2)  axes  included;  points  identified 
Default  is  ( -1 ) . 

Upper  bound  of  X-axis 

Lower  bound  of  X-axis 

Upper  bound  of  Y-axis 

Lower  bound  of  Y-axis 


VI.   Example 

/*ID 

//   EXEC   SOUP 

//SYSIN  DD  * 

CORRELATIONS  (CARDS)  (P)  (Sl/P) 


VII.PLO.U 


VI.   Example  (continued) 

PRINCIPAL  AXIS  (SI)  (S2/P)  (lO)  (lOO)  (o)  (S3/P)  ( SU ) 

PLOT  ( S2 )  ( U )  ( 2 ) . 

PLOT  (S3)  (h)    (2)  *1*  *-l*  *1*  *-l*. 

PLOT  ( SU )  . 

ENDS 

DATA  (10)  (10F3.0) 


data  deck 


END# 
/* 


This  program  would  do  the  following:   compute  a  correlation  matrix  for 
the  data  on  cards;  factor  the  correlation  matrix,  storing  10  factors  on 
S2,  10  eigenvectors  on  S3,  and  a  row  of  eigenvalues  on  Sk},    PLOT  all 
pairs  of  the  first  h   factors,  with  axes  printed,  points  identified,  and 
the  program  setting  its  own  bounds  for  the  axes;  PLOT  all  pairs  of  the 
first  h   eigenvectors  with  axes  printed,  points  identified,  and 
bounds  of  +1  and  -1  for  both  axes;  PLOT  the  row  of  eigenvalues  as  a 
successive  value  plot  with  axes  omitted  and  points  marked  as  X's. 


*This  program  was  developed  from  a  program  written  by 
the  University  of  North  Carolina. 


F.  W.  Young  of 


PRINCIPAL  AXIS  FACTOR  ANALYSIS 
(Eigenvalues  and  Vectors) 


General  Description 

The  purpose  of  PRINCIPAL  AXIS  FACTOR  ANALYSIS  is  to  determine  a 
factor  matrix,  F,  given  a  Gramian  matrix,  R,  of  order  n  such  taht 


(n,f)F'(f,n)  =  R^-(n,n) 


R*i 


where  R*  is  an  approximation  to  R- 

The  column  vectors  of  F  are  defined  as  the  factors  (m.easures  of 
dimensionality)  of  the  original  matrix,  R.   The  solution  for  the  matrix 
F  is  the  classical  eigen  problem.   Consequently,  the  computations  are 
done  by  an  eigenvalue  subroutine.   Before  output  the  eigenvectors,  E-;, 
are  scaled  as  follows : 

f(i,j)  =  e(i,j)*lambda(j)**.5 

for        I  =  1, ....,n.    J  =  1, ....,n. 

to  generate  the  principal  axis  factors,  F.  (See  Introduction  on  Factor  Analysis) 

For  a  more  detailed  discussion  see: 

Harry  Harmon,  Modern  Factor  Analysis,  Chicago,  University  of 
Chicago  Press,  I96O,,  pp.  15^-191. 

Restrictions 

The  input  matrix  for  the  PRINCIPAL  AXIS  program  must  not  exceed  the 
dimensions  of  I90  x  I90  double  precision.    The  input  matrix  is  further 
limited  to  being  a  square,  symmetric  matrix.   Generally  correlation, 
covariance,  or  cross-product  matrices  are  used  as  input  data.   It  should 
be  noted  that  matrices  with  large  numerical  entries  such  as  cross-products 
may  generate  output  values  which  cannot  be  printed  under  the  fixed  out- 
put formats.   The  probability  of  this  happening  is  very  small.   Any  com- 
munal.ity  estimation  (i.e.,  change  in  the  diagonal  entries  of  R)  must  be 
done  prior  to  the  input  of  R,  to  the  PRINCIPAL  AXIS  program. 

If  the  communality  estimates  are  used,  the  user  should  check  the 
resulting  roots  for  negative  numbers.   If  any  exist  the  associated  vector 
is  meaningless. 

The  input  data  may  come  from  any  source  conforming  to  SOUPAC.   Similarly, 
the  output  codes  follow  the  established  conventions  and  are  specified  at  the 
option  of  the  user. 


The  R  matrix  may  be  completely  factored  (i.e.,  N  factors  from  N  vari- 
able matrix) .  However,  there  are  three  criteria  which  may  be  used  to  stop 
the  factoring: 
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1.  The  user  may  specify  the  number  of  factors  to  be  extracted. 
This  criterion  provides  an  upper  limit  beyond  which  factoring 
will  not  proceed.   Therefore,  it  is  necessary  to  put  the 
maximum  value  in  this  limit  in  cases  where  it  is  not  the 
primary  criterion. 

2.  The  percentage  of  total  variance  removed  from  R  is  the 
second  limiting  criterion.   This  parameter  also  specifies 
an  upper  limit  to  the  process.   Therefore,  it  should  be  set 

at  100  per  cent  unless  it  is  the  criterion  for  stopping. 

3-   The  last  criterion  is  to  stop  when  the  factor  contribution 
(eigenvalue  or  root)  falls  below  1.   The  use  of  this  pro- 
cedure is  dictated  by  the  presence  of  its  parameter. 

If  all  three  criteria  are  employed  simultaneously,  factoring  is  stopped 
by  whichever  criterion  is  first  met. 


III.   Parameters 


The  parameters  for  the  PRINCIPAL  AXIS  program  appear  on  the  program 
call  card.   They  must  follov:  the  program  name  in  this  order: 


Parameter 
Number 

1 

2 

3 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 

Output  Address.   SEQUENTIAL  1-15  and/or  PRINT. 

Maximum  number  of  factors  to  be  extracted. 
This  must  be  less  than  or  equal  to  the 
order  of  the  input  matrix. 

The  percentage  of  total  variance  to  be 
removed  expressed  as  an  integer  between 
0  and  100. 

The  presence  of  a  number  greater  than  0 
indicates  the  factoring  should  stop  when 
the  eigenvalues  (roots')  fall  below  unity. 

Output  Address  of  Eigenvectors 


The  address  of  where  eigenvalues  are  to  be 
placed  as  a  rov/  vector  if  they  m.ust  be 
stored  for  further  use.   If  values  need 
not  be  saved,  leave  parameter  blank.  •  PRINT 
is  not  valid. 
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Parameter 

Number 


Use  or  Meaning 


Mode  of  sorting  eigenvalues  and  associated 
vectors.   The  codes  are  as  follows: 


Code 


Meaning 


0 

1 

2 

10 

11 
12 


Descending  algebraic  order 
Descending  absolute  values 
Order  of  extraction 
Ascending  algebraic  order 
(the  k  smallest  root) 
Ascending  absolute  values 
Reverse  order  of  extraction 


Leaving  any  parameter  blank  is  the  same  as  specifying  zero.   Con- 
sequently, options  which  are  not  needed  can  be  avoided  by  leaving  the 
associated  parameter  blank. 


'yy.yy. 


PROCRUSTES  (Oblique  Case) 


General  Description 

This  program  offers  3  options: 

1.   (Oblique)  Procrustes.   Given  A,  B,  the  program  solves 


2. 


AT*  =  B  +  E 

for  T*  in  a  least  square  sense  (i.e 
so  that 

T*  =  (A'aT'^'A'B, 


minimizing  tr[E'E]), 


and  then  normalized  T*  by  columns  to  yield  T  =  T*D  so  that 

diag  (T'T)  =  I.   It  then  computes  AT  which^  in  a  loose  sense, 

can  be  regarded  as  a  least  squares  fit  to  A  to  B  under  the 

restriction  that  diag  (T'T)  =  I.   It  also  provided 

^f  =  D]^(T'T)~-^  where  Dn  is  a  normalized  diagonal  matrix 

so  that  diag  (Cf)  =1.   If  D  gave  the  cosines  between  tests 

C^  will  give  the  factor  intercorrelations.   A  has  to  be  a 

full  column  rank. 

Dwyer  Extension  Analysis.   Given  F  =  ^tc  ^  centroid  or 
equivalent  matrix  of  cosines  between  tests  t  and  uncorrelated 
factors  c,    and  L  =  Rcn^  ^   matrix  of  cosines  between  uncorre- 
lated factors  c  and  uncorrelated  reference  vectors  n,  this 
program  computes 


Q  =  T 


tn 


=   F(F'F)"'^L 


which  is  used  as  a  post-multiplier  on  some  correlations 
matrix  Rg-j-  between  the  tests  t  x  in  F  and  some  set  of 
extension  variables  e  given  R^n^  "^^^  cosines  of  the  extension 
variables  e  with  reference  n,  to  the  extent  that  the  former 
can  be  projected  into  the  sub-space  spanned  by  the  latter. 
This  multiplication 

Ren  =  ^et  ^tn 
can  be  performed  by  use  of  the  MATRIX  program. 

3'   Left  Inverse  (transposed "i  .   Given  A,  the  program  will  return 

Q  =  A(A'A)"-'- 

provided  A  was  a  full  column  rank.   Q  is  the  transposed  left 
inverse  of  A  which  can  be  used  in  lease  squares  application. 


II.   Restrictions 


Input  is  restricted  to  matrices  (A,  B,  or  F)  of  order  190  x  50  or  less, 
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III.   Parameters 


The  parameters  for  this  program  appear  on  the  program  call  card. 
They  must  follow  the  progra.n  name  in  this  order: 


Parameter 

Number 

Use  or  Meaning 

Procrustes 

DEA 

LINY 

1 

Input  Address 
CARDS  or 
SEQUENTIAL  1-15- 

A 

F 

A 

2 

Input  Address 
CARDS  or 

SEQUENTIAL  1-15- 

B 

L 

3 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 

A 

F 

A 

Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 


Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 


Q 


A(A'A)' 


Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 


Output  Address 
SEQUENTIAL  1-15 
and/or  PRINT. 


AT 


Output  Address 
SEQIIENTIAL  1-15 
and/or  PRINT. 


E 


Choice  Address 


SQUARE  ROOT  FACTOR  ANALYSIS 

General  Description 

The  SQUARE  ROOT  method  of  factor  analysis,  also  called  the  Diagonal 
Method,  by  L.  L.  Thurstone,  decomposes  a  correlation  matrix  R  (or  any 
other  positive  semi-definite  or  definite  symmetric  matrix)  such  that 

R  =  F  F'  +  R(k+1) 

where  R(k+l)  is  the  residual  matrix  after  extracting  k  factors.   Of  course 
if  all  n  factors  are  extracted,  the  residual  matrix  becomes  a  null  matrix. 

The  factor  f.:  is  computed  by  dividing  each  element  of  the  j"'-'^  column 
of  R  by  its  diagonal  square  root : 


fi.i  =  ^i.i/*^ 


ij' 


JJ 


(i  =  1,2, ,n) 


The  matrix  A  =  f^'f  is  then  subtracted  from  R  and  the  operation  repeated 
on  the  residual  matrix. 

Prior  to  the  widespread  use  of  high  speed  computers,  the  SQUARE  ROOT 
method  was  sometimes  used  as  a  substitute  for  the  PRINCIPAL  AXIS  or  CENTROID 
method  due  to  the  relative  ease  of  computing  a  square  root  factor.  When 
used  in  this  way,  one  seeks  to  extract  the  maximum  variance  for  each  factor, 
in  which  case  Parameter  k   should  be  blank.   The  program  then  selects 
the  next  column  on  the  basis  of  the  largest  residual  column  sum  of  squares. 

Nowadays,  however,  the  SQUARE  ROOT  method  is  more  likely  to  be  used 
for  special,  purposes.  By  selecting  successive  pivot  variables,  the  user 
retains  control  over  the  factoring.  Factors  are  passed  directly  through 
the  test  variables  and  the  effect  of  these  variables  is  removed  from  the 
matrix.  The  communalities  or  row  sums  of  squares  are  the  squared  multiple 
correlations  of  the  remaining  variables  with  the  pivot  variables. 

The  pivots  selected  may  be  any  columns  in  the  matrix.   Let  us  assume, 
however,  that  these  are  adjacent  to  each  other  in  the  upper  right  hand 
corner  of  the  partitioned  matrix  below: 


R  - 


^PP  ^ps 
sp   ss 


^ 


Then  the  effect  of  pivoting  successively  on  the  variables  in  the  upper 
right  hand  corner  is  shown  by  the  residual  matrix  as  follows: 


0 


r"p 


0 


0 


R  "R  R  ^^. 
ss   sp  pp   ps 


II.   Restrictions 


A.   Dimension 


Maximum  size  of  the  R  matrix  is  I90  variables, 
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B.   Special  Conditions 

1.  The  researcher  may  specify  the  extraction  of  any  number  of 
factors  up  to  dimension  of  R. 

2.  The  researcher  may  specify  the  diagonal  element  to  be  used 

in  the  extraction  of  each  factor,  or  he  may  have  the  procedure 
remove  the  maximum  variance  each  time. 

3.  The  residual  matrix  may  be  saved  if  the  researcher  desires. 


III.   Parameters 


Folloving  the  program  name  the  parameters  must  appear  in  the  following 
order  on  the  program  call  card: 


Parameter 
Number 


Use  or  Meaning 

Input  Address.   CARDS  or  SEQUENTIAL  1-15- 
(Correlation  or  positive  definite  or  semi- 
definite  matrix). 

Output  Address.   SEQUENTIAL  1-15  and/or  PRIin". 

Number  of  factors  extracted. 

Input  Address  for  row  vector  specifying  order 
of  diagonal  elements  to  be  used  in  factor  ex- 
traction. Optional.  CARDS  or  SEQUENTIAL  1-15- 


IV.   Special  Comments 


Output  Address  for  residual  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 


If  the  diagonal  element  for  each  factor  is  specified,  and  if  both 
input  addresses  are  cards,  then  data  precedes  diagonal  specification. 


V.   Example 


Assume  you  have  a  20  x  20  correlation  matrix  on  cards  and  that  you 
want  to  extract  15  factors ;  also  you  are  reading  the  pivot  column  from 
cards.   The  program  would  be  set  up  as  follows: 


VII.SQU.j 


/^ID 

//  EXEC  SOUPAC 

//SYSTN  DD   ■»< 

squ(c)(pMi5UcUp) 
end  soupac 

DATA(2C)(8F9.T) 
'.         data 


END  II 
DATA(15)(15I2) 

END  # 


diagonal  specification  card(s) 


THREE-MODE  FACTOR  ANALYSIS 


General  Description 

A.  GENERAL'  COMMENTS 

This  program  provides  a  factor  analytic  solution  for  a  3 -dimensional 
i  by  j  by  k  data  matrix.   The  computational  procedures  employed  are 
those  presented  in  Method  III  of  Tucker's  article  (reference  belov) . 
This  method  provides  most  efficient  analysis  when  one  of  the  modes, 
usually  individuals  is  quite  large,  though  this  is  certainly  not  a 
necessary  condition. 

B.  THE  THEORETICAL  MODEL 

Here,  i,  j,  and  k  represent  the  modes  of  classification  which  are 
directly  related  to  the  observation  of  the  data;  i,  j,  and  k  are  thus  term- 
ed   observational  modes.   An  example  would  be  the  observation  of 
scores  for  i  individuals  on  j  tests  given  under  k  different  conditions. 

Through  factoring,  we  wish  to  reduce  the  observational  modes  i,  j, 
and  k  to  corresponding  derivational  modes  m,  p,  and  q.   Each  of  the 
derivational  modes  can  be  thought  of  as  a  set  of  factors  in  the  domain 
of  the  corresponding  observational  mode.   The  core  matrix  G  then  serves 
to  describe  the  relationships  among  the  derivational  modes. 


The  fundamental  three-mode  factor  analysis  model  is  represented  by 
the  equation: 


^ijk 


III 

a.   b .   c, 
m  q  p   im  jp  kq 


mpq 


where  x.  .,  is  an  approximation  to  the  observed  score  x.  .,  :  a.  ,  b.  , 
ijk  ijk   im   JP 


and  c,   are  entries  in  two-mode  matrices  .A  ,  .B  , 
kq  1  m  J  p 


and  ,  C   describing 
k  q 


the  elements  in  the  observational  modes  i,  j,  and  k  in  terms  of  the 

dimensions  in  the  derivational  modes  m,  p,  and  q  respectively;  the 

coefficients  g    are  entries  in  a  three-dimensional  matrix  G  and  rep- 
°mpq 

resent  the  measures  of  the  phenomenon  being  observed  for  each  combin- 
ation of  the  dimensions  of  the  derivational  modes. 

In  matrix  form,  the  model  could  be  represented  as: 

.X.,  =  .A   G,  A    B.    I      C)       , 
1  jk    1  m   (pq)  p  J    q  k 


where  A  indicates  a  Kronecker  product.   Matrices  A,  B,  and  C  are 
factor  solutions  for  modes  i,  j,  and  k  respectively  which  serve  to 
transform  the  core  matrix  G  of  the  3  derivational  modes  to  the  matrix 
X  representing  the  3  observational  modes. 
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C.   INPUT  DATA 

The  input  data  must  be  a  Gramian  matrix,  usually  correlations, 
covariances,  or  cross-products,  in  the  form  .,R.,  ,  where  i  is  assumed 

to  be  the  largest  mode  and  jk  represents  the  combination  mode,  with 
mode  k  nested  within  mode  j . 


II .   Out  put 

The  output  consists  of  the  following: 

(1)  the  .P.  and  Q^  matrices  which  represent  the  correlations, 

J  J     k  k 

covariances  or  cross-products  within  modes  j  and  k  respectively; 

(2)  the  eigenvalues  and  principal  axis  factors  of  .,R.,  ; 

jk  jk' 

(3)  the  eigenvalues  and  eigenvectors  of  P.: 

J  0 


(U)   the  eigenvalues  and  eigenvectors  of 


k  -k' 


(5)   the  core  matrix   G  ,  where  m,  p,  and  q  represent  the  deriva- 

pq  m  ^      -1   £- 

tional  modes  corresponding  to  observational  modes  i,  j,  and 
k  respectively. 

All  of  this  is  printed  out  and  may  also  be  stored  on  sequential 
storage  devices.   The  user  must  specify  the  number  of  factors  to  be 
extracted  from  each  of  the  three  modes.   This  procedure  is  employed 
since  the  use  of  other  factor-stopping  criteria  (e.g.  per-cent  of 
variance  accounted  for,  or  eigenvalues  below  unity)  could  easily  lead 
to  the  computation  of  a  great  many  useless  factors  as  well  as  a  very 
large  and  unmanageable  core  matrix.   The  user  is  also  cautioned  against 
specifying  large  numbers  of  factors  since  this  would  cause  substantial 
increases  in  time  required  to  factor  the  various  modes  and  compute  the 
core  matrix. 

It  is  strongly  recommended  that  the  user  be  familiar  with  the 
Tucker  article  and  with  factor  analysis  in  general  before  attempting  to 
use  this  program. 

Ill .  Parameters 

The  program  mnemonic  (T-M)  appears  first  on  the  program  card  and 
is  followed  by  the  following  17  parameters,  the  first  8  of  which  are 
required.   Output  addresses  are  optional.   All  output  is  printed. 


Parameter 
Number 


Description 


Input  address  of  R  matrix  in  the  form 

.,  R.,  (Cards  or  SI-SI5). 
Jk  jk 

Number  of  subjects  or  elements  in  mode  i 


V1I.T-M.3 


Parameter 
Number 


8 
9 

XO 
11 
12 
13 
lii 
15 
16 
IT 
Special  Comment 


Description 

Number  of  variables  in  mode  j  . 

Number  of  variables  in  mode  k.   » 

Number  of  factors  to  be  removed  from  matrix  R 
(mode  i ) . 

Number  of  factors  to  be  removed  from  matrix  P 
(mode  j ) . 

Number  of  factors  to  be  removed  from  matrix  Q 
(mode  k) . 

Sci^atch  address  (see  special  comment). 

Output  address  for  row  vector  of  eigenvalues  of  R, 

Output  address  for  principal  axis  factors  of  R. 

Output  address  for  matrix  .P.. 

J  J 

Output  address  for  row  vector  of  eigenvalues  of  P, 
Output  address  for  eigenvectors  of  P. 


Output  address  for  matrix 


k^k- 


Output  address  for  row  vector  of  eigenvalues  of 
Output  address  for  eigenvectors  of  Q. 


Output  address  for  core  matrix   G  . 

pq  m 


The  three-mode  factor  analysis  program  requires  three  separate  interval 
storage  areas.   SOUPAC ,  however,  has  only  two  such  areas  available  within 
its  programs.   Thus,  the  user  must  supply  a  scratch  address.   This  can  be 
any  sequential  file  (S1-S15)  not  used  as  another  parameter  in  the  three-mode 
program.   Any  data  previously  stored  on  this  file  will  be  destroyed. 


V .   Reference 

Tucker,  Ledyard  R.   Some  mathematical  notes  on  three-mode  factor  analysis 
Psychometrika,  I966,  31,  279-311. 


UNRESTRICTED  MAXIMUM  LIKELIHOOD  FACTOR  ANALYSIS 

Parameter 
Number 


k 
5 
6 

T 
8 
9 

10 

11 

12 
13 


Use  or  Meaning 

Input  Address  for  correlation  matrix. 
SEQUENTIAL  1-15;  CARDS  are  not  permitted 

Output  Address  for  final  unrotated  factor  matrix. 
SEQUENTIAL  1-15-   See  also  Parameter  13. 

Input  Address  for  row  vector  of  initial 
estimate  of  uniqueness.   CARDS,  SEQUENTIAL 
1-15  (optional). 

Lower  bound  for  number  of  factors. 

Upper  bound  for  number  of  factors. 

Sample  size  (number  of  observations)  on 
which  correlation  matrix  is  based. 

Maximum  n^umber  of  iterations. 

Probability  of  chance  occurance,  i.e.,  *1.00*. 

1  to  print  input  correlation  matrix  and  partial 
correlation  matrices  after  any  variables  have 
been  removed. 

1  to  print  technical  output. 

1  to  print  intermediate  results. 

1  to  punch  unrotated  factor  matrices. 

1  to  apply  a  varimax  rotation  to  all  factor 
matrices.   If  this  parameter  is  used  the  output 
of  parameter  2  will  be  a  rotated  factor  matrix. 


This  program  has  been  taken  directly  from  Joreskog  (I96T)  with  his  per- 
mission.  Anyone  interested  in  the  methods  is  referred  to  the  references  listed 
below.   The  program  is  temporarily  limited  to  75  variables  and  30  factors. 
Parameters  1,  k,    5»  to?  T)  and  8  are  required.   Parameter  8  must  be  enclosed 
within  asterisks,  **,  and  must  have  a  punched  decimal  point. 

References : 

Joreskog,  K.  G.  UMLFA  -  a  computer  program  for  unrestricted  maximum  likelihood 
factor  analysis.   Research  Memorandum  66-20.   Princeton,  New  Jersey: 
Educational  Testing  Service.   Revised  Edition,  196?. 

JSreskog,  K.  G.   Some  contributions  to  maximxam  likelihood  factor  analysis. 
Psychometrika,  I96T,  32,  U1+3-U82. 


VARIMAX  FACTOR  ROTATION 


I .   General  Description 

VARIMAX  ROTATION  is  used  to  redistribute  a  factor  matrix  (principal 
axis,  centroid,  etc.)  variance  so  that  the  matrix  approaches  orthogonal 
simple  structure.  The  varimax  scheme  maximizes  the  following  criterion 
function: 


E  (hZ(a/ .   X 
s   .   (j,s) 
J 


2/h(.,2)2 


ilia...      .^'*'(3f)f) 

•      V  J  5  S  / 

(J 


II 


where  j  is  the  variable  index  number:   1, 
s  is  the  factor  index  number:   1,.. 


,  f 


a,,  .       \  is  the  factor  loading  of  the  j    variable  on  the  s    factor 

hj   is  the  j    variable  communality 

For  further  discussion  see: 

H.  F.  Kaiser,  "Computer  Program  for  Varimax  Rotation  in  Factor 

Analysis",  Educational  and  Psychological  Measurement,  Vol.  XIX, 
Nov.  3,  1959,  pp.  Ul3-i+20. 

Cooley  and  Lohnes,  Multivariate  Procedures  for  the  Behavioral 

Sciences ,  New  York,  John  Wiley  and  Sons,  Inc.,  19^2,  pp.  l6l-3. 

Restrictions 


The  number  of  factors  may  be  anything  greater  than  or  equal  to  2. 
Any  factor  matrix  generated  by  a  statistical  system  factor  analysis 
program  is  acceptable  input.   A  matrix  may  also  be  entered  from  cards. 
If  this  is  the  case,  the  number  of  rows  in  the  matrix  must  be  specified 
on  the  data  format  card. 

Parameters 


The  parameters  for  the  VARIIvIAX  ROTATION  appear  on  the  program  call 
card.   They  must  follow  the  program  name  in  this  order: 


Parameter 
Number 


Description 


Input  Address.  CARDS  or  SEQUENTIAL  1-15-   If 

CARDS  are  used  the  DATA  card  must  contain  the 

number  of  rows  as  well  as  the  number  of 

columns  in  the  input  matrix  (see  User's  Guide 
for  details ) . 


Output  Address.   SEQUENTIAL  1-15  and/or  PRINT, 


VII.VAR.2 

Par  am 

eter 

Wumt 

er 

Description 

3 

The  presence  of  a  number  greater  than  0  in  this 
parameter  indicates  the  communalities  should  be 
printed. 

k 

0  or  blank  for  normal  VARIMAX.   1  if  raw  VARI- 
MAX   is  desired. 

5 

Output  Address  of  transformation  matrix  T.          H 

SEQUENTIAL  1-15  and/or  PRINT, 


II 


III 


VARISIM  ROTATION 


General  Description 

Given  an  input  factor  matrix  A,  this  program  computes  by  an  iterative 
procedure  an  orthonormal  transformation  matrix  T  to  rotate  A  to  simple 
structure  by  the  equation 

AT  =  B. 

Thus,  the  program  provides  an  orthogonal  rotation  of  a  factor  matrix  A  of 
uncorrelated  factors  to  a  factor  matrix  B  of  uncorrelated  factors.   The 
mathematical  criterion  employed  is  quite  similar  to  Kaiser's  Varimax 
criterion;  however,  there  exist  definite  contrasts  between  the  two  tech- 
niques.  First,  the  Varisim  program  is  considerably  slower  than  the 
Varimax  program,  often  taking  three  to  four  times  as  long  to  obtain  con- 
vergence.  The  time  ratio  is  dependent  on  the  size  of  the  factor  matrix, 
there  being  only  small  differences  for  small  factor  matrices.   Second, 
the  two  methods  lead  to  quite  different  solutions  in  the  case  of  a  large 
factor  matrix,  though  results  are  often  quite  similar  for  small  matrices. 
In  general,  the  factors  rotated  by  the  Varisim  program  are  characterized 
by  more  even  contributions  to  the  common  test  variance  than  the  corres- 
ponding Varimax  factors.   That  is,  while  Varimax  attempts  to  concentrate 
variance  accounted  for  on  the  first  few  factors,  Varisim  will  distribute 
this  variance  accounted  for  more  evenly  across  the  factors.   Naturally, 
Varimax,  by  definition,  provides  greater  simplicity  of  structure,  though 
usually  only  slightly  superior  to  Varisim. 

Restrictions 


If  input  is  from  cards,  the  ntunber  of  rows  in  the  input  factor 
matrix  must  be  specified  on  the  data  format  card. 

Parameters 


The  program  mnemonic  VSM  appears  first  on  the  card  and  is  followed 
by  up  to  6  parameters . 


Parameter 
Number 


Description 


Input  address  of  factor  matrix  A.   CARDS 
or  SEQUENTIAL  1-15- 

Output  address  of  rotated  factor  matrix  B. 
SEQUENTIAL  1-15  and /or  PRINT. 

Output  address  of  transformation  matrix  T. 
SEQUENTIAL  1-15  and/or  PRINT. 

Maximum  number  of  iterations;  if  left 
blank,  100  iterations  will  be  stopping 
criterion.   (See  special  comment.) 


S«g: 


VII.VSM.2 


Parameter 
Niimber 


Description 

Exponent  of  convergence  n,  where  tolerance 
ETA  =  10-'^  .   If  left  blank,  n  is  set  to  3. 
(See  special  comment.) 

Output  address  of  factor  matrix  A.   SEQUEN- 
TIAL 1-15  and/or  PRINT,  or  left  blank  if  not 
desired . 


I 


IV.   Special  Comment 


Only  in  rare  situations  will  the  limit  of  100  iterations  be 
approached.   In  most  cases,  10  to  20  iterations  will  be  sufficient. 
The  default  convergence  criterion  of  10~3  should  be  adequate  in  all 
cases  since  this  has  been  found  to  lead  to  an  unambiguous  and  unique 
solution.   Thus,  for  virtually  any  standard  input  factor  matrix,  param- 
eters h   and  5  can  be  left  blank. 


TUCKER-MESSICK  POINTS  OF  VIEW  ANALYSIS 


I .   General  Cormnents 

One  of  the  classic  experimental  designs  in  perceptual  research 
involves  subjects  making  similarity  judgments  for  all  possible  pairs 
of  a  given  set  of  stimuli.   The  Tucker-Messick  model  was  developed 
to  analyze  such  data  in  order  to  discover  whether  particular  groups 
of  individuals  have  different  viewpoints  about  stimulus  interrelation- 
ships.  Results  of  the  analysis  show  what  different  viewpoints  are 
present  within  the  sample  and  the  extent  to  which  each  individual  uses 
each  point  of  view. 

Since  its  conception,  however,  the  technique  has  been  applied  to  a 
wide  variety  of  data  for  purposes  of  examining  individual  differences 
in  judgment  or  performace.   This  results  from  the  fact  that,  for  any 
set  of  measurements  for  N  individuals  on  n  variables,  the  model  specifies 
the  dimensions  of  greatest  variation  among  individuals,  and  the  extent 
to  which  each  individual  is  characterized  by  each  dimension.   The  tech- 
nique can  be  of  great  value  in  studying  individual  differences  in  many 
different  situations. 

II .   Description  of  the  Mathematical  Model 

The  present  description  of  the  model  will  be  based  on  a  data  matrix 
containing  measurements  on  or  by  N  individuals  on  n  stimuli  or  variables. 
(Note:   these  stimuli  can,  in  fact,  be  stimulus  pairs  in  a  paired- 
comparison  judgment  situation.)   This  program  assumes  input  of  an  N  x  n 
data  matrix,  X'.   The  crucial  question  which  the  model  attempts  to  answer 
is  whether  there  exists  consistent  covariation  among  groups  of  individuals 
on  the  n  variables.   The  question  is  resolved  by  factoring  X'  into  its 
principal  components.   The  factor  solution  of  X'  indicates  the  number  of 
dimensions  required  to  account  for  individual  differences  in  performance 
or  judgments.   For  judgment  data,  this  represents  the  number  of  consis- 
tent viewpoints  being  used  within  the  sample.   The  final  result  of  this 
procedure  consists  of  one  matrix  which  specifies  the  dimensions  of  the 
stimulus  space  accounting  for  greatest  individual  variation,  and  another 
jnatrix  specifying  the  extent  to  which  each  observed  individual  is  char- 
acterized by  each  of  these  dimensions. 

The  notation  below  corresponds  to  notation  in  the  Tucker-Messick 
article.   Since  X  is  asymmetric  and  rectangular,  it  cannot  be  factored 
directly.   Thus,  we  use  the  Eckart-Young  procedure  to  construct  a  matrix 
X  which  approximates  X  but  is  of  lower  rank.   X  is  constructed  from  the 
r  largest  characteristic  roots  and  vectors  of  X,  according  to  the  Eckart- 
Young  model : 


(1) 


X  =  u  r  V7 

r    r  r  r 


These  components  are  computed  from  the  cross-products  matrix  P, 
( 2 )   P  =  XX ' , 


VII.VEW.2 


as  follows:   analyze  P  into  principal  components;  i.e., 

(3)  p  =  ur^u' 

and  truncate  to  r  desired  characteristic  roots  and  vectors 

now  solve  equation  (l)  for  W  : 

r 

(h)      W  =  r  ~^U  'X. 
r    r   r 


We  can 


Elements  in  W  represent  projections  of  points  corresponding  to  indi- 
viduals on  the  unit-length  principal  vectors  of  X.   Elements  in  U 
represent  projections  of  points  corresponding  to  stimuli  on  the  unit 
length  principal  vectors  of  X. 

Since  each  vector  of  W  is  composed  of  N  elements  and  is  of  unit 
length,  it  is  otvious  that  the  loadings,  or  projections  of  individuals, 

are  dependent  on  sample  size  N, 


We  can  rescale  W  into  a  matrix  V 

r 


(5) 


1/2 

V  =  N  '  W 


Then  the  coefficients  in  V  will  be  independent  of  sample  size.  To 

art-'!) 

-1/2 


maintain  the  Eckart-Young  relationship,  we  must  also  rescale  U  : 


(6)   Y  =  U  N" 


Equation  (l)  can  now  be  rewritten: 


(T)   X  =  Yr  V, 
r     r 


Elements  of  V  and  Y  represent  scaled  projections  of  individuals  and 
stimuli  respectively  on  the  principal  vectors  of  X.   Matrix  V  can  then 
be  converted  to  a  factor  matrix  A  of  scaled  projections  of  individuals 
on  principal  factors  by  weighting  each  vector  by  the  square  root  of 
the  corresponding  eigenvalue: 


(8) 


A  =  r  V, 

r 


III. 


Combining  equations  (T)  and  (8),  we  find  that: 

(9)   X   =  YA. 

This  equation  represents  the  final  result  of  the  present  program. 

Output  and  Further  Analyses 

1.   Data  matrices  computed:   This  program  will  compute  and  print 
or  store  on  request  matrices  P,  W  ',  U  ,  Y,  V,  and  A',  along  with  the 
eigenvalues  of  P.   Printed  output  includes  explanatory  labels  and 
equations . 


VII.VEW.3 


2.  Rotation:  Y  and  A  might  be  rotated  so  that  the  inherent  dimen- 
sions of  individual  variation  will  be  in  positions  more  appropriate  for 
psychological  interpretation.  A  transformation  matrix  T  can  be  derived 
to  rotate  to  simple  structure,  e.g.  by  the  Varimax  criterion: 

(10)  B  =  TA,   and 

(11)  Z  =  YT"-^ 

We  then  have,  combining  equations  (9))  (lO),  and  (ll), 

(12)  X  =  YT'^TA  =  BZ 

r 

and  the  Eckart-Young  theorem  is  still  satisfied. 

3.  Person  Space  Plots:   Since  entries  in  matrix  B  represent  co- 
ordinates of  points  for  individuals  on  rotated  axes,  this  space  may  be 
readily  plotted  graphically  by  making  scatter  plots  for  each  possible 
pair  of  axes.   This  may  be  done  by  hand  or  by  inputting  matrix  B  to  the 
SOUPAC  program  SCATTER  PLOTS.   Plots  could  also  be  made  prior  to  rota- 
tion on  the  basis  of  matrix  A.   These  plots  can  be  a  crucial  step  in  the 
analysis  of  homogeneous  subgroups  or  of  widely  deviant  individuals  in 
the  data. 

h.      Correlating  dimensions  with  outside  variables:   Since  each 
in,dividual  receives  a  score  on  all  r  derived  dimensions,  correlations 
may  be  computed  between  these  dimensions  and  scores  on  other  outside 
measures — perhaps  personality  or  performance  measures — in  order  to 
ascertain  properties  and  correlates  of  the  derived  dimensions.   This 
would  be  accomplished  by  augmenting  the  matrix  A  (or  B)  with  a  matrix 
of  measures  for  the  same  individuals  on  other  variables  and  computing  a 
correlation  matrix  from  this  data. 

5.   Idealized  Individuals:   The  user  can  insert,  at  any  desired 
location  on  the  plots  of  the  factor  space  of  individuals,  additional 
points  which  can  be  interpreted  as  "idealized  individuals."  Their 
location  can  be  determined  from  any  desired  criterion,  and  any  number 
of  idealized  points  can  be  inserted  into  the  person  space.   These  points 
of  interest  may  represent  centroids  of  clusters,  deviant  individuals, 
etc.   By  combining  the  Tucker -Messick  program  with  the  MATRIX  program, 
the  user  can  reconstruct  raw  data  for  the  conceptual  individuals  based 
on  the  r  dimensions  retained.   This  would  be  done  as  follows: 


a) 
b) 
c) 


e) 
f) 


read  the  coordinates  of  each  ideal  point  on  each  factor; 
record  the  r  coordinates  of  each  point  in  a  row  vector; 
adjoin  these  row  vectors  for  g  idealized  individuals  into 
a  matrix  G' ; 

punch  the  matrix  G'  and,  using  the  MATRIX  program,  store 
it  on  a  sequential  file; 

store  the  matrix  Y  as  output  from  the  present  program; 
using  the  MATRIX  program,  transpose  Y  to  get  Y'  and  com- 
pute reconstructed  data  matrix  X^  '  via  the  following 
multiplication: 


g 
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X'  =  G'Y' . 


This  can  "be  done  after  rotation  by  substituting  rotated  dimensions 
(Z')  for  unrotated  dimensions  (Y'). 

IV .   Special  Comments 

(1)  Input :   The  input  matrix  is  denoted  X'  and  is  composed  of 
N  rows  representing  individuals  and  n  columns  representing  variables 

or  stimuli. 

(2)  Eckart-Young  approximation  of  X':   By  out putting  matrices 
U  and  W  (parameters  6  and  7)  along  with  the  eigenvalues  of  P,  the 
user  can  obtain  the  fundamental  Eckart-Young  resolution  of  the  raw 
data  matrix.   By  performing  the  proper  matrix  manipulations  (equation 
(1)),  one  can  obtain  an  Eckart-Young  approximation  to  X  based  on  r 
dimensions . 

(3)  Specifying  the  number  of  factors:   The  number  of  dimensions 
to  be  retained  must  be  specified  in  parameter  3.   This  decision  is 
usually  based  on  a  preliminary  computer  run  which  computes  and  factors 
the  matrix  P;  the  number  of  dimensions  retained  is  determined  by  the 
resulting  series  of  eigenvalues. 

{h)      Type  of  P-matrix  to  be  used  in  analysis:   Though  the 
mathematical  model  is  written  in  terms  of  a  cross-products  matrix  P, 
this  matrix  might  also  be  composed  of  correlations  or  covariances. 
This  option  is  specified  in  parameter  2. 

(5)   Multi-dimensional  scaling:   This  technique  was  originally 
developed  to  be  used  with  judgment  data,  in  conjunction  with  multi- 
dimensional scaling  analyses.   If  the  user  wishes  to  proceed  in  this 
direction,  it  is  strongly  recommended  that  he  have  at  least  a  funda- 
mental understanding  of  the  Tucker -Messick  model  as  well  as  the  general 
concepts  involved  in  multi-dimensional  scaling.  The  SOUPAC  office  has 
information  about  and  access  to  the  widely  used  TORSCA  program, 
commonly  used  in  conjunction  with  the  Tucker -Messick  model. 

V.   Parameters 


The  program  mnemonic  VEW  appears  first  on  the  card  and  is 
followed  by  up  to  10  parameters.   Since  computations  are  based  on 
parameters  which  are  specified,  there  should  be  no  blanks  in  the 
parameter  string.   For  example,  if  the  user  wants  output  through 
parameter  9 ■,   all  previous  parameters  must  be  specified.   This  occurs 
because  the  program  cannot  compute  the  matrix  for  parameter  9  with- 
out first  computing  the  previous  matrices.   The  exception  to  this  is 
that  parameter  5  can  be  left  blank  since  eigenvalues  will  always  be 
computed  and  printed. 


VII.VEW.5 


Parameter 
Number 


Description 

Input  address  of  matrix  X' 
SEQUENTIAL  1-15. 


CARDS  or 


VI 


References 


Code  specifying  type  of  P  matrix  to  be 
computed: 

(1)  cross-products 

(2)  correlations 

(3)  covariances. 
(1)  is  default. 


Number  of  factors  to  be  extracted  from 
matrix  P. 

Output  address  of  matrix  P.   SEQUENTIAL  1-15 
and /or  PRINT. 


Output  address  of  row  of  eigenvalues  of  P. 
SEQUENTIAL  1-15.   These  are  printed  auto- 
matically. 

Output  address  of  matrix  U:   projections  of 
stimuli  on  unit  length  vectors  of  X.   SEQUEN- 
TIAL 1-15  and/or  PRINT. 

Output  address  of  matrix  W :   projections  of 
individuals  on  unit  length  vectors  of  X. 
SEQUENTIAL  I-I5  and/or  PRINT. 

Output  address  of  matrix  V:  scaled  projec- 
tions of  individuals  on  principal  vectors  of 
X.   SEQUENTIAL  1-15  and /or  PRINT. 

Output  address  of  matrix  Y:   scaled  projections 
of  stimuli  on  principal  vectors  of  X.   SEQUEN- 
TIAL 1-15  and/or  PRINT. 
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Output  address  of  matrix  A':  scaled  projec- 
tions of  individuals  on  principal  factors  of 
X.   SEQUENTIAL  1-15  and/or  PRINT. 


Tucker,  L.  R.  and  Messick,  S.  "An  Individual  Differences  Model  for 
Multidimensional  Scaling",  Psychometrika,  19^3 ,  pp.  333-367. 


ECONOMETRICS   PACKAGE 


■m^-^ 


■Ay-yy.'y.'.- 


-Kk- 


ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS 


General  Description 

The  ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS  program  performs  oper- 
ations on  the  model  X3  +  YF  =  U.  Using  the  following  definitions  (dimensions 
of  matrices  are  given  in  parentheses): 


T  is  the  number  of  observations 
NY  is  the  number  of  eq^uations  in  the  model  and  the  number  of  jointly  depen- 
dent variables  in  the  model  since  the  two  must  be  the  same 
NX  is  the  number  of  predetennined  variables  plus  the  constant  tenn 
N  is  the  number  of  variables  plus  the  constant  term;  N  =  NY  +  NX 
[XY]/    ^  is  the  raw  data  matrix  plus  a  coltmin  of  constant  terms 


ffl 


is  the  matrix  of  coefficient  estimates.   This  matrix  includes  the 
estimate  of  the  intercept  term 

(N,NY) 


The  program  calculates  the  following: 


(1)  Estimate  of  residuals:  U/^  ^^s   =  (XY),^  ^. 


r  1 

3 


(n,ny) 


(2)      Durbin-Watson   statistic   for   each  equation   (i) 

,2 


DW(i)    = 


^^2    tU.(t)    -  U.    (t-1)]' 
T      ^  _ 

,S,[u.(t)]2 


(3)   Estimate  of  the  variance-covariance  matrix  of  residuals: 

^         1  '^       ^ 
(NY, NY)  "^T^iNYjT)  ^(T,NY) 

(M   Reduced  form  estimates:   fi^^^^^^,  =  ('^'=^' (NX.NX)  ''^'^'(M.Ny) 

(5)  Reduced  form  predicted  values:   Y^^  ^y)  =  ^(t  nx)  ^(NX  NY) 

(6)  Estimate  of  reduced  form  residuals:   V^^  ^^^    =  Y^^  ^y)  "  ^(t  nY) 

(T)   Estimate  of  the  variance-covariance  matrix  of  reduced  form,  residuals 
^(NY,NY)  "  T  ^(NY,T)  ^(T,NY) 
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II .   Restrictions 

Only  those  inputs  used  in  the  calculations  called  for  need  be  given. 
They  must  be  in  the  following  formats: 

(1)  Coefficients: 

The  coefficient  matrix  for  K  equations  with  N-1  variables,  Nl  predeter- 
mined and  N2  jointly  dependent,  must  be  a  K  by  N  matrix.   Each  row 
corresponds  to  an  equation.   The  first  element  in  each  row  is  the  con- 
stant term  followed  by  the  coefficients  matrix  (i.e.,  predetermined 
coefficients  first;  jointly  dependent  coefficients  next).   In  each  row, 
there  must  be  -1  which  corresponds  to  the  jointly  dependent  variable 
that  was  normalized  on. 

(2)  Raw  Data: 

The  data  must  be  arranged  so  that  predetermined  variables  occur  first 
and  jointly  dependent  variables  last.   (The  TRANSFORMATION  program  may 
be  used  to  arrange  data  in  this  way,  if  it  is  not  already  like  this). 

(3)  Raw  Data  Cross-products  Matrix: 

The  cross-products  matrix  must  have  the  following  form: 


ZX 


ZX 


Cross-products 


Care  should  be  taken  to  see  that  an  input  address  is  specified  for  any  data 
needed  in  calculating  the  desired  statistics  and  that  any  intermediate  statistics 
needed  are  stored (i.e.,  an  output  address  besides  print  is  specified).   The 
following  list  indicates  which  previous  statistics  are  needed  in  the  calcula- 
tion of  each  statistic. 


1.  Estimate  of  Residuals  ~  coefficients  and  raw  data 

2.  Durbin-Watson  statistic  -  coefficients  and  raw  data 

3.  Estimate  of  Covariance  matrix  of  residuals  -  coefficients,  raw  data 
and  the  number  of  observations 

h.        Estimated  Reduced  form  coefficients  -  raw  data 


I 


5-      Reduced  form  predicted  values  and  residuals  -  reduced  form  coefficients 
(no  output  address)  and  raw  data 
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(6)   Estimate  of  Covariance  matrix  of  reduced  form  residuals — estimate 
of  reduced  form  residuals  and  the  number  of  observations 


Parameters 


The  parameters  appear  on  the  program  card  following  the  mnemonic  ECON 
in  the  following  order  (See  also  Special  Comments): 


Parameter 
Number 


Description 

Input  Address  for  coefficients.   SEQUENTIAL  1-15- 
Same  as  output  address  for  K-CLAS. 


Input  Address  for  raw  data  cross-products  matrix. 
SEQUENTIAL  1-15-   Same  as  output  address  for  K-CLAS. 

Input  Address  for  raw  data.   SEQUENTIAL  1-15-   (See 
Special  Comments). 

Number  of  predetermined  variables  (total). 

Output  Address  for  estimates  of  residuals.   SEQUEN- 
TIAL 1-15  and/or  PRINT. 


If  P,  the  estimate  of  the  variance-covariance  matrix 
of  residuals  is  printed. 

If  P,  (estimated)  reduced  forms  are  calculated  and 
printed. 

If  P,  reduced  form  predicted  values  and  residuals  are 
printed. 


Special  Comments 


If  P,   the  estimate  of  the  variance-covariance 
matrix  of  reduced  form  residuals  is  printed. 


If  Parameter  Number  3  is  specified,  the  Durbin-Watson  statistic  will  be 
calculated  and  printed. 


The  ECONOMETRIC  REDUCED  FORM  AED  RESIDUAL  ANALYSIS  program  requires  input 
from  the  K-CMSS  ESTIMATION  program  and/or  the  THREE  STAGE  LEAST  SQUARES 
program.   The  following  example  illustrates. 
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V.   Example 

/*ID       [accounting  information  to  include  REGION  size] 

/ /   EXEC   SOUPAC 

//SYSIN  DD  * 

TRA(C). 

MOV(C)(SU). 

END  P 

K-C(SU)(S2)*1*(T)(S1)(S3)(P). 

(3)(1)(1)(2)(3)(8). 

(3)(1)(1)(2)(M(10). 

(3)(1)(5)(6)(T)(9). 

END  P 

EC0N(S2)(S3)(SU)(T)(S5/P)(P)(P)(P)(P). 

END  S 

Notice  that  there  is  no  ENDP  card  after  the  ECON  program  because  ECON 
has  only  a  main  parameter  card. 

VI.   References 
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K-CLASS  ESTIMATION 


I.   General  Description 

There  are  three  estimators  which  belong  to  the  K-class.   These  include; 
Ordinary  Least  Squares  (multiple  regression),  IVo-stage  Least  Squares,  and 
Limited  Information  Maximum  Likelihood. 

II .   Description  of  K-Class  Output 

The  K-Class  program  calculates  the  "basic  statistics: 

IX. 

-     where  N  =  Sample  Size 

zx.x. 


Mean:   X.  =  -rr 

1     N 


Variance  Covariance: 


Standard  Deviation; 


ij 


N 


X.X. 
1  J 


s.  =  /  S. . 

1      11 


s. . 

Correlation:   C.  =  — ^ — 
ij    s.s. 

Cross-products  in  matrix  notation:   CP  =  X'X 

K-Class  also  calculates  the  eigenvalue  to  "be  used  in  Limited  Information 
Maximum  Likelihood  estimation. 

K-Class  then  goes  on  to  calculate  estimates  and  associated  statistics  for 
Ordinary  Least  Squares  (OLS),  Limited  Information  Maximum  Likelihood,  and  Two- 
Stage  Least  Squares. 

Formulas 


6'  = 
Y  = 

y 

Y 
X 


o 


estimates  of  the  jointly  dependent  coefficients 
estimates  of  the  predetermined  coefficients 
variable  normalized  on 

jointly  dependent  variables  in  the  equation 
predetermined  variables  in  the  equation 

X  =  predetermined  variables  in  the  system 

a  =   intercept  term 

e  =   error  term 


^* 
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Estimating  Formulas 


^  Y 


Y'Y  -  kV'V 
X»Y 


Y'X 

X'X 
#  * 


or 


0  =  A  ^  J 


where  V'V  =  Y'Y  -  Y' X(X' X)~^X' Y 

V'y  =  Y'y  -  Y'X(X'X)'^X'y 
•'  o     "^  o  "^  o 


k  determines  the  estimating  technique  and  is  an  arbitrary  scalar  which 
may  be  either  random  or  nonstochastic . 

Standard  Error  of  Estimate 


;  =       y;y,  -  (B  y)  c 


N-.i 


where  j  =  Rank  of  A 


-1 


Standard  Error  of  the  Estimated  Coefficients, 


11 


=  The   square   root  of  the   i        diagonal   element   of  the 


o     A  matrix 


T-Ratio 


t      = 


11 


Covariance  matrix  of  the   coefficients 

2  "2-1 

C      =      o   ^  A 


Intercept  term  : 


a  =  y^  -  Y(3)  -  (X^)(y) 
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II.   Ordinary  Least  Squares  (OLS) 

When  k  =  0  is  specified.  Ordinary  Least  Squares  estimates  are  computed. 
Y  is  then  assumed  to  be  another  predetermined  variable  in  the  equation.   When 
ordinary  least  squares  is  specified  the  following  additional  statistics  are 
supplied . 

In  matrix  notation 


r2  = 


0C  -  Zyo"  /  N 

yo'yo  -  %o  /  N 


Explained  Sum  of  Squares 
Total  Sum  of  Squares 


Total  Sum  of  Squares:   TSS  =  yo'yo  -  i^Yo)    /^ 

Regression  (Explained)  Sum  of  Squares;   RSS  =  Qc   -  I    {jo    )/N 

Error  (Unexplained)  Sum  of  Squares:   ESS  =  TSS  -  RSS  =  yo'yo  -  QC 


Example:   Suppose  that  there  were  two  possible  models  that  one  wanted  to 
estimate.   In  one  model  variable  10  is  included  and  in  the  other  variable  10 
was  not  included. 

K-C(S1)(SU)*0*(10)(S2)(S3)(P). 

(10)  (1)(1)  (2)  (3  )(!+)(  5)  (6)  (7)  (8)  (9)  (10)  (11). 

(9)(1)(1)(2)(3)(1+)(5)(6)(T)(8)(9)(11). 

END  P 

V-   Limited  Information  Maximim  Likelihood  (LIML) 

l^en  k  =  y,  where  y  is  the  smallest  eigenvalue  of  the  equation,  is  specified, 
then  Limited  Information  Maximum  Likelihood  estimates  are  computed. 

The  eigenvalue  is  calculated  in  the  following  manner: 

Let         W^  =  Y'Y  -  Y'X^  (X'X)~^X4  Y 

W  =  Y'Y  -  Y'X(X'X)~-^  X'Y 

K-Class  uses  the  smallest  eigenvalue  of  the  matrix  WW   . 

Example:   Suppose  that  there  is  a  two  equation  system  with  four  predetermined 
and  two  jointly  dependent  variables.   The  following  program  will  calculate  LIML 
estimates  for  both  equations: 

K:-C(C)(S3)*-1*(1|)(S1)(S2)(P)(S5). 
(2)(2)(1)(1|)(5)(6). 
(2)(2)(2)(3)(5)(6). 
END  P 
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V.   Two-Stage  Least  Squares  (2SLS) 

When  k  =  1  is  specified  Tvo-Stage  Least  Squares  estimates  are  computed. 

Example:   Suppose  one  has  a  three  equation  model  that  contains  three  jointly 
dependent  and  ten  predetermined  variables.   The  following  program  will  calculate 
2SLS  estimates. 

K-C(C)(S3)*1*(10)(S1)(S2)(P). 
(5)(2)(1)(2)(3)(U)(5)(11)(13). 
(M(2)(3)(U)(6)(T)(12)(13). 
(5)(2)(1)(2)(8)(9)(10)(11)(12). 
END  P 

VI.   Parameters 


A.      Main  Parameters 

Parameter 
Number 


Description 

Input  Address.   CARDS  or  SEQUENTIAL  1-15. 

Output  Address  for  estimated  coefficient. 
SEQUENTIAL  1-15-   Sajne  as  input  address  for  ECON. 

Floating  point  value  of  k.   (See  special  comments) 
This  value  should  be  enclosed  in  asterisks. 

Niunber  of  predetermined  variables  in  the  system. 

Output  address  for  cross-products  matrix. 
SEQUENTIAL  1-15  and/or  PRINT.   Same  as  input 
address  for  ECON. 


Output  address  for  raw  data  covariance  matrix. 
SEQUENTIAL  1-15  and/or  PRINT. 

T  If  P,  correlations  matrix  will  be  printed. 

8  Output  address  of  eigenvalues  if  LIML.   SEQUENTIAL 

1-15. 

9  Type  of  Input   0  =  raw  data 

1  =  cross-products 

2  =  covariance 

If  the   input   is   raw  data  the  ninth  parameter  may 
be  omitted. 

B.      Sub-parameters    (Equation  Control  Cards) 

Subparameters   are  needed   for   all  K-Class   programs.      There   is   one   sub- 
parameter   card  for   each  equation.      Each  equation  card  has   the   following   form: 
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Parameter 
Number 


Description 

Number  of  predetermined  variables  in  the  equation. 

Number  of  jointly  dependent  variables  in  the 
equation. 

The  variable  number  of  all  variables  in  the 
equation  in  the  order: 

1  -  predetermined  in  the  equation 

2  -  jointly  dependent  in  the  equation 

with  variable  standardized  on  last. 


C.   DATA  cards  must  be  punched  with  predetermined  variables  first,  jointly 
dependent  variables  last. 


VII .   Special  Comments 

A.  If  k  =  *0* 
If  k  =  *1* 
If  k  =  *-l* 


Ordinary  Least  Squares  Estimates  are  computed. 
2-Stage  Least  Squares  Estimates  are  computed. 
Limited  Information  Maximum  Likelihood  Estimations 
are  computed. 


K-Class  accepts  data  from  cards  or  intermediate  storage  either  as  raw 
data,  cross-products,  or  covariance.  If  cross-products  or  covariance 
are  used  as  input  to  K-Class,  the  matrix  must  be  in  the  following  order 


Cross-products 


Covariance 


Ex 


Lx 


cross- 
products 


covariance 
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LINEAE  PROGRAMMING 


General  Description 

LINEAR  PROGRAMMING  maximizes  or  minimizes  a  linear  function  subject  to 
certain  linear  inequalities  called  constraints. 

In  matrix  notation: 

Find  the  solution  to 

AX  <,    ~,    >b   (a  system  of  linear  equations  or  inequalities) 
which  maximizes  (or  minimizes) 

Z  -  CX 
where  X  >  0 

A  is  the  matrix  of  coefficients  of  the  constraints^  X  the  vector  of  vari- 
ables, C  the  vector  of  costs  or  profits  associated  with  e ach  variable^  and  b 
a  vector  or  matrix  of  non-negative  constants  which  places  a  bound  on  the  linear 
equations. 

The  equations,  AX  <,     ~,    >b  in  n  variables  define  and  bound  a  space  called 
the  feasible  space  in  which  all  allowable  values  of  the  n  variables  are  defined. 
The  SIMPLEX  criterion  finds  those  combinations  of  variables  which  optimize  the 
objective  function  within  this  feasible  space.   To  solve  the  system  of  linear 
equations  defined  above,  the  inequalities  m.ust  be  changed  to  equalities.   This 
is  accomplished  by  addition  of  surplus  variables  to  "greater  than"  constraints, 
and  slack  variables  to  "less  than"  constraints.   To  create  the  basis  for  solving 
a  system  of  linear  equations,  an  identity  matrix  must  be  formed  and  augmented 
to  the  A  matrix  of  structural  variables.   Creation  of  the  identity  matrix  is 
completed  by  addition  of  artificial  variables  to  constraints  with  a  "greater 
than"  relational  operator.   The  program  adds  any  needed  variables. 

Since  there  are  more  variables  (structural  +  surplus  +  slack  +  artificial) 
than  rows,  some  method  must  select  which  variables  will  be  in  solution.   The 
SIMPLEX  Algorithm  selects  a  number  of  variables  (equal  to  the  number  of  rows) 
which  will  be  in  solution.   The  final  solution  is  the  maximum  (or  minimum)  of 
the  linear  function  subject  to  the  constraints.   Since  slack  and  surplus  vari- 
ables have  "real"  meaning,  they  may  appear  in  the  final  and  intermediate  solu- 
tions.  Their  presence  as  a  non-zero  value  indicates  that  the  constraint  to 
which  they  were  added  is  not  binding.   Artificial  variables  have  no  "real" 
meaning.   Presence  of  artificial  variables  in  solution  indicates  that  some  con- 
straints are  so  constructed  as  to  preclude  a  solution  which  has  "real"  meaning. 

The  slack  and  surplus  variables  are  given  costs  of  zero  in  the  objective 
function.   Artificial  vai'iables  are  given  large  negative  costs.   SIMPLEX  attempts 
to  drive  artificial  variables  from  solution. 

Failure  to  drive  artificial  variables  from  solution  may  indicate  a  problem 
in  which  constraints  are  mutually  exclusive  or  that  the  cost  assigned  to  the 
artificial  variable  is  not  large  enough. 
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In  matrix  notation  the  augmented  matrix  before  calculations  begin  would 
appear  as :   ,   i   i 

A  I  I  I  S  I  -  b 
where  I  is  the  identity  matrix  of  slack  and  artificial  variables  and  S  is  the 
matrix  of  surplus  variables.   Row  operations  are  performed  on  the  augmented 
matrix  according  to  the  SIMPLEX  criterion.  After  any  number  of  row  operations, 
the  inverse  matrix  of  the  original  coefficients  of  structural  variables  now 
in  solution  is  contained  in  the  columns  where  the  original  identity  matrix  was 
located.   At  every  stage  (row  operation)  an  identity  matrix  will  be  present. 
This  identity  matrix  indicates  the  variables  in  solution. 


Since  the  original  table  is  stored  by  the  prograun,  it  is  possible  to  com- 
pare the  results  of  the  inverse  obtained  through  LINEAR  PROGRAMMING  with  the 
inverse  obtained  by  a  standard  inversion  technique.  The  user  may  set  the  absol^; 
value  for  this  comparison  in  Parameter  3-  If  the  comparison  does  not  meet  the  f 
accuracy  requirement,  a  new  table  is  formed  using  the  original  table  and  the 
calculated  inverse.  After  a  feasibility  check,  the  program  continues  calculatioi 
until  an  acceptable  solution  is  obtained. 
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II 


Restrictions 


The  program  is  limited  to  a  maximum  of  90  rows  or  constraints,  300  coluirai^ 
or  variables,  and  5  columns  in  the  requirement  matrix. 

These  limits  are  internal  limits  and  the  user  is  warned  that  large  problems 
may  exceed  the  program  capacity  during  accuracy  check  and  calculations  involving 
multiple  column  requirement  matrices.   Program  capacity  WILL  be  exceeded  if: 
the  number  of  constraints  +  number  of  structural  variables  +  number  of  "greater 
than"  inequalities  >  300. 


III. 


Input  may  come  ONLY  from  CARDS  in  the  form  of  subparameters. 
Parameters 


All  floating  point  numbers  (indicated  by  FP)  must  be  enclosed  by  a  pair  of 
asterisks.   All  integer  numbers  (indicated  by  IM)  must  be  enclosed  in  parentheses 
The  main  call  to  the  program  and  each  subparameter  must  be  terminated  by  a  period 
(•). 

The  program  is  entered  by  punching  the  symbols  L-P  followed  by  the  appropri- 
ate main  parameters  and  subparameters.   All  main  parameters  have  default  options. 
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Main  Parameters  to  follow  L-P 

1  Cost  of  artificial  variables  *large  negative  FP  numbers*. 
Default  =  -1.E50. 

2  Minimum  value  for  calculations  ^FP*.   If  any  calculation 
falls  below  this  value,  it  is  set  to  zero.   Default  = 
internal  calculations. 


3    Value  for  accuracy  check  *FP*.   If  absolute  value  for  calcu- 
lated difference  (See  General  Description)  falls  below  this 
value,  final  value  is  termed  inaccurate  and  calculations  are 
performed  to  correct  rounding  errors.   Default  =  .5  • 

h  If  1,  suppress  print  of  solution  matrix  (IN). 

5  If  Ij,  suppress  print  of  check  matrix  (INK 

6  Print  every  IN"^"  step,  i.e.  rovy  operation  (iN"*  . 

7  If  1,  insert  small  positive,  non-zero  number  for  any  zero  in  the  b 

vector.   Useful  aid  if  b  vector  contains  many  zeros- 

Subparameter 

Tne  program  now  expects  to  find  the  word  MINimize  or  MAXimize  followed 

by  a  string  of  constants  which  represent^  in  sequential  order,  the  cost  or 
values  associated  with  each  variable.   All  non-zero  constants  (with  or  without 
decimal)  must  be  enclosed  by  a  pair  of  asterisks.   Zeros  may  be  enclosed  by 
asterisks.   A  series  of  sequential  zeros  may  be  represented  by  a  pair  of  paren- 
theses, i.e.  the  integer  number  in  the  pair  of  parentheses  represents  the  number 
of  sequential  zeros  to  be  inserted.   All  coefficients  must  appear  and  be  in 
sequential  order. 

The  cost  coefficients  representing  the  objective  function  are  terminated  by 
a  period.   The  constraints  are  entered  in  a  similar  manner.   All  variables  must 
be  in  sequence.   Coefficients  of  zero  must  be  included.   Multiple  requirement 
vectors  are  entered  in  the  standard  form.   The  constraint  is  terminated  by  a  per- 
iod.  Comments  which  do  not  include  period  (.),  comma  {,),    asterisks  (*)  or  left 
parenthesis  may  be  entered  at  any  point  outside  those  characters  delimiting  con- 
stants.  The  requirement  vectors  are  separated  from  the  rest  of  the  constraint 
by  relational  operators.   All  coefficients  must  appear  and  be  in  sequential  order, 

The  program  recognizes  three  relational  operators:   LE  (less  than  or  equal), 
EQ,  (equal),  and  GE  (greater  than  or  equal).   These  relational  operators  are 
surrounded  by  quotes  (").   See  Section  V.   Examples  in  this  program. 

Output 

The  output  consists  of  the  objective  function,  the  final  solution  matrix, 
the  variables  in  solution,  and  the  optimal  functional  value.   In  addition, 
Shadow  Prices  or  opportunity  costs  are  printed.   Shadow  Prices  provide  useful 
information  on  the  "cost"  of  having  certain  constraints,  or  the  increased  pro- 
fit to  be  obtained  by  'relaxing'  a  particular  constraint. 
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For  example : 

Constraint  1:  2X{l)    +   2X(3)  <  5-0 
To  this  constraint,  slack  variable  X(l)  is  added  to  make  it  an  equality.   In 
the  final  solution,  X(l)  is  not  in  solution.   The  optimal  maximum  functional 
value  is  20.   The  'Shadow  Price'  on  variable  X(l)  is  2.0.   This  means  that  if 
we  relax  this  constraint  to  6.0,    the  optimal  maximum  value  could  be  22.0.  For 
every  unit  the  constraint  is  relaoced,  the  functional  value  will  be  changed  by 
the  Shadow  Price.   The  Shadow  Price  holds  until  the  constraint  is  no  longer 
binding.   The  same  logic  may  be  applied  to  "GE"  type  constraints  with  surplus 
variables.   For  interpretation  of  Shadow  Price  for  structural  and  artificial 
variables,  the  user  is  referred  to  texts  under  headings  such  as  "Dual  Algorith 
Interpretation  of  the  Dual",  and  "Opportunity  Costs". 

Basis  variables  refer  to  those  variables  which  form  the  original  identity 
matrix.   The  variable  numbers  are  listed  in  the  order  they  were  added  to  the 
constraints.   The  number  of  basis  variables  will  always  equal  the  number  of 
constraints.   To  determine  whether  a  basis  variable  is  a  slack  or  artificial 
variable,  refer  to  the  coefficients  of  these  variables  in  the  objective  func- 
tion.  A  slack  variable  will  have  a  coefficient  of  0.0. 

MESSAGES 


PROBLEM  TOO  LARGE:   More  than  300  variables  or  100  constraints  on  input  or 
during  addition  of  slack,  surplus,  and  artificial  variables. 

NORM  FOR  CUTOFF:  Value  of  Main  Parameter  Number  2,  either  supplied  or 
default. 

ERROR  IN  SIMPLX:   Source  Program  Error.   See  a  consultant. 

SOLUTION  UNBOUNDED:   Constraints  do  not  form  a  closed  space.   Optimal 
functional  value  is  infinite. 

NUMBER  OF  ITERATIONS:  Number  of  row  operations  needed  to  calculate  final 
solution.   For  multiple  requirement  vectors,  number  is  not  cumulative. 

ACCURACY  ACCEPTABLE  or  ACCURACY  NOT  ACCEPTABLE:   Comparison  with  Main  Para- 
meter Number  3- 

VARIABLE  ADDED 

NEW  BASIS  VARIABLES  ARE:   Iterations  either  inaccurate  and  new  variable  added 
or,  during  execution  of  multiple  requirement  vector,  a  new  variable  had 
to  be  added  to  make  problem  feasible  (requirement  vector  positive) . 

NON-RESOLVABLE  TIE :   Cannot  occur  mathematically.   Only  reason  for  occurance 
is  due  to  rounding  error  in  machine.   Can  be  corrected  by  incrementing 
or  decrementing  requirement  vector  by  a  small  amount.   Perform  this 
only  for  constants  of  same  value.   (Use  Parameter  7). 

Other  messages  should  be  self-explanatory. 

IV.   Special  Comments 

Speed  and  accuracy  can  be  increased  by  observing  the  following  suggestions 

l)  Never  make  Parameter  3  (accuracy  check)  larger  than  (O.l)  X  (number  of 
significant  digits  in  table).   For  example,  if  numbers  in  the  table  are 
h,    5,  .001,  86,  95-32,  you  have  "one"  significant  digit.   Set  Parameter  3 
to  *.l*. 


•y/wi»^''ji';v.v 
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2)    Scale  numbers  in  table  to  get  them  into  same  range.   For  example^  if 
table  entries  are  of  the  order  10-^,  and  the  requirement  vectors  are  of 
the  order  103,  scale  requirement  vectors  to  10-^  and  rescale  solution  by 
10''.   The  objective  function  may  also  be  rescaled  in  a  similar  manner. 
Rescaling  essentially  reflects  the  number  of  significant  digits. 

Example  s 

The  problem: 

Minimize  -.75X(l)  +  150X(2)  -.02X(3)  +  6x{k) 
Subject  to  the  following  constraints: 
Constraint (l) 

.25X(l)  -60X(2)  -.Ol+X(3)  +  9X(^)  <  0,  1,  2 
Constraint (2) 

.05X(1)  -90X(2)  -.02X(3)  -3X(1+)  <  0,  1,  2 
Constraint (3) 

1X(3)  <  1,  2,  3 

Could  be  set  up  on  cards  as  follows : 


//  EXEC   SOUPAC 

//SOUPAC.SYSIN  DD  * 

L-P^-1.E20*^*.1*()  ()(l). 

MIN*-  .  75^-^150^^- .  02^^6* . 

LABOR  *.25*^-60**-.0J+**9*"LE"^0*-^l*^2*. 

LAND^.05^^-90**-.02*^-3.0*"LE"*0*^l**2*. 

CASH(2)*1*(iV'LE"*1*^2**3-'^. 

EW   PROGRAM 

END  SOUPAC 
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This  problem  will  result  in  an  unbounded  solution  with  requirement 
vector  number  one.   The  problem  terminates  without  performing  calculations 
on  the  other  vectors. 


Note  insertion  of  sequential  zeros  on  CASH  card. 


QUADRATIC  PROGRAMMING 


I.    General  Description 


This  program  maximizes  the  quadratic  function  cX  +   1/2  X-^DX  subject  to 
the  linear  constraints  AX  <_b,  where  c  is  an  n-vector,  D  is  a  symmetric 
negative  definite  n  by  n  matrix,  A  is  an  m  by  n  matrix  of  coefficients 
or  constraints  and  b  is  an  m  vector. 

The  Kuhn-Tucker  theory  shows  that  a  solution  to  the  constrained  maxi- 
mization problem  is  obtained  if  and  only  if  vectors  X,  L,  V,  and  W  can  be 
found  such  that : 

DX  -  A^L  +  V  =  -c 
AX       +  W  =  b 

where  the  elements  of  X,  L,  W,  and  V  are  non-negative  and  the  conditions 
v'^X=0  and  w'^L-0   are  satisfied.   To  find  these  vectors,  artificial  vectors 
Z^   and  Z^  are  added  to  the  first  equation  and  a  y-vector  is  added  to  the 
second.   Simple  techniques  are  then  used  to  eliminate  first  the  Y 
and  then  the  Z  vax'iables. 

References : 


Carr,  C.  R.  and  C.  H.  Howe,  Quantitative  Decision  Procedures  in  Management 
and  Economics,  McGraw-Hill ,  196^+ . 

Hadley,  G. ,  Nonlinear  and  Dynamic  Programming,  Addison-Wesley ,  I96U. 

Wolfe,  P.,  "The  Simplex  Method  for  Quadratic  Programming",  Econometrica, 
27,  1959,  pp.  382-398. 

NOTE:   Carr  and  Howe  claim  that  elements  of  the  Vj-vector  may  not  be 
entered  in  the  first  stage  of  the  simplex  procedure.   Since  this  requires 
that  there  exist  a  solution  to  AX  =  b ,  it  is  a  severe  restriction.   It 
is  also  unnecessary,  and  this  program  does  enter  W-variables  during  the 
first  stage.   Otherwise,  the  procedures  used  closely  follow  those  of 
Carr  and  Howe. 

II.   Restrictions 

The  maximum  number  of  X-variables  is  UO.   The  number  of  x-variables 
plus  the  number  of  constraints  must  be  <_  80. 

The  D-matrix  must  be  negative  definite.   If  this  is  doubtful,  use  the 
PRINCIPLE  AXIS  FACTOR  ANALYSIS  program  to  extract  the  eigenvalues.   All 
must  be  negative.   Semi-definite  D-matrices  may  be  perturbed  or  the  user 
may  limit  the  number  of  iterations  to  be  performed.   If  this  limit  is 
exhausted,  final  solution  vectors  will  be  printed  out  (see  below). 

The  only  form  of  input  is  a  matrix  of  data.   If  there  are  n  x-variables 
and  m  constraints,  the  matrix  should  have  n  +  1  columns  and  m  +  n  rows, 
partitioned  as  follows: 


m^i 
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"mm 


D  (n  X  n) 


c  (n  X  1) 


A  (m  X  n) 


b  (m  X  1) 


Note  that  this  is  the  c  vector,  not  the  -c  vector  mentioned  in  the 
Kuhn-Tucker  formulas.   Also  note  the  +  sign  and  the  1/2  coefficient  of  the 
xDx  term.   All  constraints  in  this  type  of  input  are  assumed  to  be  <  type. 
Multiply  >  constraints  through  by  -1.   The  equality  constraing: 


n 
L 


a- .X.  =  b. 


is  equivalent  to  the  two  constraints  Za^^X.  <  b-  and  Z-a.  .X.  <  b.  . 
This  matrix  can  be  read  in  from  cards  or  from  temporary  storage. 


The  elements  of  the  w-vector  are  always  non-negative  and  are  to  be 
considered  "slack"  for  <  constraints  and  "surplus"  for  >  constraints. 

The  user  may  obtain  the  basis  vector  at  the  end  of  each  iteration 
showing  which  variables  are  in  the  basis  and  their  quantities  (option  2). 
He  may  alternatively  have  the  entire  matrix  printed  out  after  each  iteration 
(option  3)'   The  user  is  cautioned  that  option  3  can  use  immense  quantities 
of  paper  and  time  unless  the  problem  is  very  small. 


III.   Parameters 


The  program  call  card  should  have  the  name  QUADRATIC  PROGRAMMING 
followed  by  these  parameters: 


Parameter 
Number 

1 

2 

3 


Use  or  Meaning 

Input  Address.   SEQUENTIAL  1-15  or  CARDS, 

Number  of  Contraints. 

Output  option: 

0  if  final  results  only 

1  if  iterated  basis  vectors 

2  for  entire  iterated  matrix 


Limit  on  number  of  iterations  if 
desired.   Leave  blank  otherwise.   Default 
is  1000. 
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Parameter 
Number 


Use  or  Meaning 

Pertubation  quantity.   Punch  quantity 
to  be  subtracted  from  diagonal  of  D- 
matrix  between  asterisks  instead  of 
parenthesis;  e.g.,  ^.001*.   Leave  blank 
if  not  desired. 


IV.   Examples 
Example  I 


Suppose  we  wish  to  maximize  the  quadratic  function 


F  =  lOx^  +  20x2  "^  ^5x3  -  Ixj' 
subject  to  the  constraints 


2xo  +  lx]_X2 


2x1  +  3x0  +  1x3  <  50 


Ix 


1 
3X1  +  2x. 


+  ^.X- 


<  70 

<  60 


Since  the  D-matrix  is  only  negative  semi-definite,  it  should  be  perturbed 
to  insure  convergence  to  a  solution.   The  following  set  of  cards  would  solve 
the  problem  using  data  matrix  input : 

/*ID 

//  EXEC  SOUPAC 

//SYSIN     DD      * 

Q"l f./\DRATI C   PROGRAMMING     ( CARD S)(3)(0)(0)*.001->^. 

END   SOUPAC 

DATA()^)(^F3.0) 


-2 

1 

0 

10 

1 

-h 

0 

20 

c 

0 

0 

Q 

;i^ 

C- 

■3 

1 

50 

1 

0 

3 

70 

b 

3 

2 

0 

60 

h 


M 
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X2^ 


Example  II : 

Maximize 

F  -  8x-L  +  lOxp  -  xi^ 
subject  to  the  constraint 

3x2_  +  2x2  <  6 
The  D-matrix  is  negative  definite.   The  problem  would  be  set  up  as  follows 

/^■ID 

//  EXEC  SOUPAC 

//SYSIN  DD   * 

QUAD  (C)(1)  (2). 

END  SOUPAC 

DATA(r'(F2.0,2F3.0) 


D  -2  0 

0  -2 
A   3   2' 
END//' 


10 


The  extreme  value  of  the  objective  function  for  this  example  is  .213E02  . 


EXACT  RESTRICTED  LEAST  SQUARES 


I.   General  Description 

Exact  restricted  least  squares  can  be  used  in  two  ways:   l)  to  include 
prior  information  about  a  parameter,  or  2)  to  test  a  linear  hypothesis. 

Assumptions : 

1)  Y  =  XB  +  u 

2)  u  'v  N(o,a^l) 

3)  X  is  nonstochastic 

h)      X  has  rank  K  <  T 

5)   r=R3    risajxl  known  vector,  R  is  a  j  x  K  known  matrix  and 
g  is  a  K  X  1  vector  of  parameters  in  the  model. 

Assumption  5)  is^the  hypothesis  to  be  tested.   Exact  restricted  least  squares 
minimizes  (Y  -  Xg^, ) '  (Y-xg^^)  subject  to  r  =  Rg  ,  where  L  ,  the  restricted 
estimator,  is  given  by 


Sr  = 


,-1.. 


-IpM-l 


+  (X'X)""r'[R(X'X)""R']~"  (r-R3),  and 


3  is  the  ordinary  least  squares  estimator 

~1t5  » 


Var(6j^)  =  a^[(X'X)"^ 


(X-X)"^RUR(X'X)~^R']"^R(X'X)~^] 


An  unbiased  estimate  of  a  given  that  the  prior  information  is  true,  is 


R 


(Y-X3j^)'(Y-X3j^) 
T  -  K  +  j 


T-K+j 


The  F  test  that  is  given  to  test  the  null  hypothesis  that  r  =  Rg.   The  F 
test  is  calculated: 

SSE-  -  SSE- 
Sr     ^ 


SSE. 


T-K 


F'  '\.  F 


(j,T-K) 


II 
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Parameters 


III, 


The  folloving  parameters  should  follow  the  mnemonic  RLS: 
Parameter 


Number 


Description 

Input  address  for  raw  data.   SEQUENTIAL  1-15 
and  CARDS. 

Input  Restriction  Matrix.   SEQUENTIAL  1-15 
and  CARDS. 

Nijuiber  of  restrictions. 

Number  of  equations. 

1  if  want  cross  products  matrix  printed. 

1  if  want  covariance  matrix  printed. 

1  if  want  covariance  matrix  of  ordinary  least 
squares  estimates. 


1  if  want  correlation  matrix  printed. 
Subpar amet er s  -  equation  control  cards 

1  Number  of  exogenous  variables  in  the  equation. 

2  -  K+1  Variable  number  of  the  K  exogenous  variables. 

K+2  Variable  number  of  the  endogenous  variable. 

The  intercept  term  is  referred  to  as  the  coefficient  of  variable  0. 
Variable  0  is  considered  to  be  an  exogenous  variable  hose  value  is  always  1. 


IV.   Input 


If  all  input  is  from  cards,  the  decks  are  read  in  the  following  sequences 

1)  Raw  data 

2)  Restrictions  matrix  which  must  be  in  the  form  [R:r]  .    z,^  -,  \ 

J  X  (K+1) 


V.   Output 


The  program  prints  out  ordinary  least  squares  estimates  first  as  part 
of  an  intermediate  step.   Also,  the  F  test  and  a  residual  covariance  matrix 
are  calculated  and  printed  out  under  ordinary  least  squares  estimation. 
Coefficients,  errors,  and  the  F  test  are  calculated  under  the  restrictions 

and  printed  out . 
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Sample  Program 

Suppose  we  are  estimating  a  general  linear  model  of  the  form 

where  3  is  the  intercept  term  and  3-,  and  ^     are  the  coefficients  of  the 
exogenous  variables.   Let  us  esimate  the  coefficients  by  ordinary  least  squares 
and  test  the  hypothesis  that  the  exogenous  coefficients  sum  to  1.   The  Re- 
strictiom  matrix  would  be  of  the  form: 


r  =  R3  = 
1  =  [0  1  1] 


The  following  sample  program  describes  the  parameters  and  the  order  of  the 
cards  needed  to  obtain  exact  restricted  least  squares  for  this  model. 

/*ID 

//   EXEC   SOUPAC 

//SYSIW  DD  * 

RLS(C)(C)(1)(1)(1)(1)(1)(1). 

(3)(0)(1)(2)(3). 

END  P 

END  S 

DATA(3)(3F10.0) 


data  deck 
[X^X^Y] 


END# 

DATA(l4)(liF5.0) 
0.     1.      1 
END# 


Note:   Variable  0  is  not  included  in 
the  raw  data  matrix;  but  its  co- 
efficient, 3  5  is  included  in  the 
o 


restriction   matrix 


[R: 


References 

[l]   Goldberger,  Arthur  S.,  Econometric  Theory,  New  York,  John  Wiley  and  Sons, 
Inc. ,  i960. 

[2]   Judge,  G.  G.,  and  Yancey,  T.  A.,  "The  Use  of  Prior  Information  in  Esti- 
mation of  the  Parameters  of  Economic  Relationships,"  Metro economics , 
Vol.  XXI  (196T). 
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STOCHASTIC  RESTRICTED  LEAST  SQUARES 

I .  General  Description 

Stochastic  restricted  least  squares  can  be  viewed  as  a  special 
case  of  generalized  least  squares.   Usually  we  have  some  prior 
knowledge  of  the  approximate  size  of  the  coefficients.   It  seems 
reasonable  that  by  using  this  information  we  can  obtain  more  efficient 
estimates . 

Assumptions: 

1)  Y  =  XB  +  u 

2)  u  -^  (0,  a^l) 

3)  X  is  a  set  of  fixed  variates 
k)      X  has  rank  K  <  T 

5)  r=RB+v    risajxl  known  vector  and  R  is  a  j  x  k  known 
matrix  of  prior  information  with  the  stochastic  term  v. 

6)  V  ^  (0,  y) 

We  can  develop  the  model  in  the  following  manner: 

(1)  [^]  =  [r^  B  +  [;;] 


^rUn [U'V* ]      rQ   lOn      ^ 

Where  E[^]^     ^  =  [  ^^^  ]  =  0 


Apply  generalized  least  square  to  equation  (l 


B*  =  [(X'R')$"^[^]  ]  X'R'$'^[^] 
R  r 

which  reduces  to 


B*  =  [^  +  R'H^'^R]"^  [^  +R'4'"^r] 

0  a 

2 
Since  a   is  usually  unknown  Theil  suggests  that  a  consistent 

estimate  of  a  be  used  to  replace  it. 


TT    2    (Y-XB)   (Y-XB)    ,     ,,   ;,      ^u  ^-  n    + 

Use  s  =  ^=—- where  the  B's  are  the  ordinary  least  squares 

i  — K 

estimates  of  B.   This  new  estimator 
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B*  =  [^  +  R'H^"^R]~^   [^  +  R'^^'^r: 


will  be  consistent  and  have  the  same  asymptotic  moment  matrix  as  B* . 

Using  additional  information  is  not  enough.   We  would  like  to  know 
if  the  prior  Information  is  compatible  with  the  sample  information. 
Theil  develops  a  compatability  statistic 


6  =  (r-RB)'  [o^R(X'X)"^R'  +  Y]  "'■  (r-RB) 


which  is  distributed   as  Chi-square  with  j  degrees  of  freedom.   This 
statistic  tests  the  hypothesis  that  the  sample  and  prior  information 
are  compatible.   Since  a2  is  usually  unknown  we  must  substitute  s   into 
the  statistic 

6  =  (r-RB)'   [s^R(X'X)"^R'  +  Y]"^  (r-RB) 

which  will  have  the  same  asymptotic  distribution  as  6.   Judge  and 
Yancey  [l]  develop  an  alternative  feasible  compatability  statistic: 


6*  =  V 
J 

They  showed  that   6*  '\^  F 


(j,   T-K) 


The  program  will   print   out   Theil 's  compatability  statistic   6 
user  desires   6*,    just  divide  Theil's   statistic  by  j. 

II.   Parameters 


If  the 


The  following  parameters  should  follow  the  mnemonic  STO 

Parameter 


Number 


Description 


Input  Address  for  raw  data.   Sequential 
1-15  and  CARDS 

Input  Restrictions  Matrix.   Sequential 
1-15  and  CARDS 

Number  of  restrictions 

Number  of  equations 

Input  address  of  covariance  matrix  of  prior 
information:  "V .      Sequential  1-15  and  CARDS 

1  if  want  cross  products  matrix  printed 
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Parameter 
Number 


Description 

1  if  vant  covariance  matrix  printed 

1  if  want  correlation  matrix  printed 


Subparameters  -  equation  control  cards 
Parameter 


Number 

1 

2-K+l 

K+2 


Description 

Number  of  exogenous  variables  in  the  equation 
Variable  numbers  of  the  K  exogenous  variables 
Variable  number  of  the  endogenous  variable 


The  intercept  term  is  referred  to  as  the  coefficient  of  variable  0. 
Variable  0  is  considered  to  be  an  exogenous  variable  whose  value  is  always  1. 

Input 

If  all  input  is  from  cards,  the  decks  are  read  in  the  following  sequence: 

1 )  Raw  Data 

2)  Restrictions  Matrix  which  must  be  in  the  form  [R:r].    ,^^  ^^ 

J  X  (K+1) 

3)  Covariance  Matrix  of  the  restriction 

Output 

Ordinary  Least  Squares   and  the  corresponding  F  test  are  printed  out  as 
an  intermediate  step.   Theil's  compatibility  statistic  and  the  stochastically 
restricted  coefficients  and  errors  are  also  printed  out. 

Sample  Program 

Suppose  we  have  the  model 


Y  =  X 


0 


+  u 
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In  addition  to  this  simple  linear  model  we  believe  that   Z  B.  =  1. 

i=l   ^ 
Since  we  don't  know  this  with  complete  certainty,  we  assign  a  variance 
of  1/16  to  this  prior  knowledge. 

In  matrix  form  then 

r  =  RB  +  V  becomes 


and 


[01111] 


I-  k  ^ 


+   V 


V  'v  (0,  1/16) 


The  following  sample  program  describes  the  parameters  and  order  of 
the  cards  needed  to  obtain  stochastic  restricted  least  squares  estimates 
for  this  model. 


/*ID 

//  EXEC  SOUP 

//SYSIN  DD   * 

ST0(C)(C)(1)(1)(C)(1)(1)(1) 

(5)(0)(1)(2)(3)(M(5). 

END  P 
END  S 
DATA(5)(5F10.0) 


'.      data 

deck 

END# 

data(6)(6f5 

.0) 

0.      1. 

1. 

END# 

DATA(l)(F10.0) 

.0625 

[^ 

END# 

/* 

References 

X, 


,X^Y] 


1. 


=  .0625] 


Note:   Variable  0  is  not  included 
in  the  raw  data;  but  its  co- 
efficient, &    ,    is  considered  to 
be  in  the  restriction  matrix. 


[R:r] 


[1]  Judge,  G.  G.,  and  Yancey,  T.  A.,  "The  Use  of  Prior  Information  in 
Estimation  of  the  Parameters  of  Economic  Relationships,"  Metroeconomics, 
Vol.  XXI  (1969) . 

[2]  Theil,  H.,  "On  the  Use  of  Incomplete  Prior  Information  in  Re- 
gression Analysis,  "Journal  of  American  Statistical  Association,  Vol.  58 
(1963). 
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[3]  Theil,  H.,  and  Goldberger,  A.  S.,  "On  Pure  and  Mixed 
Statistical  Estimation  in  Economics,"  International  Economic  Review, 
Vol.  2  (1961). 


[h]      Yancey,  T.  A.,  Judge,  G.  G.,  and  Bock,  M.E.,  "A  Mean  Square 
Error  Test  When  Stochastic  Restrictions  Are  Used  in  Regression," 
Quantitative  Economics  Workshop  Paper,  Department  of  Economics, 
University  of  Illinois  (1970). 


THREE  STAGE  LEAST  SQUARES  ESTIMATION 


General  Description 

The  THREE  STAGE  LEAST  SQUARES  ESTIMATION  program  calculates  and  prints 
out  three  stage  least  squares  estimates  and  an  asymptotic  covariance  matrix. 
A  raw  data  covariance  matrix  and  two  stage  least  squares  residual  covariance 
matrix  are  the  necessary  input.   Calculations  are  carried  out  as  in  "Econo- 
metric Theory"  by  Arthur  S.  Goldberger,  pp.  3^7-352.   The  coefficients  may 
also  be  stored  for  use  with  the  ECONOMETRIC  REDUCED  FORM  AND  RESIDUAL  ANALYSIS 
program. 

References : 

Goldberger,  Arthur  S.,  Econometric  Theory,  New  York,  John  Wiley  and  Sons,  Inc. 
I96U. 

Johnson,  J.,  Econometric  Methods,  New  York,  McGraw-Hill  Book  Company,  Inc., 
i960. 

Parameters 

The  parameters  appear  on  the  program  card  following  the  mnemonic  THREE  in 
the  following  order: 


Parameter 
Number 


Description 

Input  Address  for  raw  data  covariance  matrix.   SE- 
QUENTIAL 1-15.   Same  as  output  address  for  KICLAS. 

Output  Address  for  coefficients.   SEQUENTIAL  1-15. 

Input  Address  for  residual  covariance  matrix.   SE- 
QUENTIAL 1-15.   Same  as  output  address  for  ECON. 

Number  of  equations  to  be  estimated. 

Niomber  of  exogenous  variables. 


Subparamet  er  s 

For  each  equation  a  card  specifying  the  variables  in  the  equation  must 
follow  the  main  parameter  card  with  the  following  parameters: 


Parameter 
Number 


Description 

Number  of  exogenous  variables  in  the  equation 

Number  of  endogenous  variables  in  the  equation. 


VIII. THREE. 2 


Parameter 
Number 


3  to  N  +  2 


Description 

Variable  number  of  the  N  variables  included  in  the 

equations  with  exogenous  variables  first;  endogenous 

variables  next,  with  the  variable  on  which  the  system 
is  normalized  last. 


Ill .   Special  Comments 

Although  the  raw  data  deck  is  arranged  with  exogenous  variables  first  and 
endogenous  variables  last,  the  endogenous  coefficients  are  printed  out  first, 
followed  by  the  exogenous  coefficients. 

The  THREE  STAGE  LEAST  SQUARES  ESTIMATION  program  requires  input  from 
several  other  SOUPAC  programs.   The  following  is  an  example  of  the  steps 
needed  to  calculate  the  necessary  input. 

IV.   Example 

K1CLAS(T1)(T2)(0)(0)()(1)(1). 

ENDP 

K2CLAS(T2)(T3)(TU)()(8)(2)*1.*. 

(U)(2)(1)(2)(3)(M(1)(2). 

(M(2)(5)(6)(7)(8)(2)(l). 

ENDP 

EC0N(T3)(T2)()(8)()(T5). 

THREE(T2)(t6)(T5)(2)(8). 

(M(2)(1)(2)(3)(M(1)(2). 

(i^)(2)(5)(6)(T)(8)(2)(l). 

ENDP 

Notice  that  the  equation  control  cards  for  both  K2CLAS  and  THREE  STAGE 
LEAST  SQUARES  must  be  in  the  same  order. 

Also  notice  that  an  ENDP  card  is  required  after  the  equation  cards. 
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THE  TRANSPORTATION  PROBLEM 


I .   General  Description 

The  "transportation  problem"  is  a  special  case  of  linear  programming 
and  is  of  interest  because  of  its  computational  simplicity.  Many  economic 
and  business  applications  of  this  computation  technique  have  nothing  to  do 
with  transportation.   The  name  is  derived  from  its  original  formulation. 
The  essence  of  the  problem  can  best  be  described  by  a  simple  example. 

Suppose  a  manufacturer  has  3  factories  and  he  supplies  5  locations 
Suppose  that  the  cost  per  unit  from  each  factory  to  each  location  is 
given.   Also  assume  that  the  capacity  of  each  factory  is  given  and  the 
amount  demanded  is  equal  to  the  total  capacity  of  the  manufacturer.   The 
transportation  problem  is  to  find  the  minimum  total  cost  to  ship  the 
capacity  of  all  3  factories  to  the  5  demand  locations. 

The  following  tabled  example  should  make  this  clearer  (taken  from 
reference  [l ] ) . 


Demand  Locations 


Factory 


Amount 
Demand 


1 


I   20  i 

i    30   I 

.1 4- 


2 


$15 
UO 
35 


25  :    115 


3 
$20 
15 
UO 


60 


1+ 


$20    $U0 

30   j   30 

I 

55   I   25 


30 


TO 


Capacity 
50 
100 
150 

300 


The  amount  demanded  row  has  the  amount  demanded  at  each  location.   The 
capacity  column  contains  the  amount  available  at  each  factory.   The 
middle  matrix  is  the  cost  matrix  of  shipping  one  unit  of  goods  from 
factory  i  to  location  j  where  i  =  1,  2,  3,  and  j=l,  2,  3,  ^,  5- 
Notice  that  the  sum  of  the  capacities  must  equal  the  sum  of  amount 
demanded . 


The  idea  is  to  minimize  the  total  transportation  cost  while 
specifying  that  all  goods  must  be  shipped  and  all  demands  must  be 
satisfied.   The  computation  technique  used  to  solve  the  above  problem 
is  described  in  most  linear  programming  texts  (see  references). 


VIII. TRN. 2 


II.   Parameters 


III 


The  parameters  follow  the  letters  TRN, 
Parameter 


Wumher 


Description 

Input  address  of  the  supply  capacities, 
and  SEQUENTIAL  1-15- 


CARDS 


Input  address  of  the  amount  demanded.   CARDS 
and  SEQUENTIAL  1-15- 


Input  address  of  the  cost  matrix. 
SEQUENTIAL  1-15. 


CARDS  and 


Input 


Both  the  supply  capacities  and  the  amount  demanded  are  read  in  as  row 
vectors.  If  all  the  data  comes  from  cards  the  data  decks  must  be  ordered: 
supply  capacities,  amount  demanded,  and  then  the  cost  matrix. 


IV.   Output 


Output  consists  of  the  optimal  transportation  order  printed  in  matrix 
form,  the  total  cost  of  transportation,  and  a  sensitivity  analysis  showing 
the  maximiim  reduction  of  cost  of  shipping  from  supply  point  i  to  demand 
point  j  and  leaving  the  present  solution  optimal. 

Example 

/*ID 

//   EXEC   SOUP 

//SYSIN  DD   * 

TRN(C)(C)(C)  . 

END  S 

DATA  (3)(3F5.0) 

50.     100. 

END# 

DATA(5)(5F5.0) 


150, 


25. 

END# 

DATA(5) 

10. 

20. 

30. 

END# 

/* 


115. 

5F5.0) 
15. 
^0. 
35. 


60, 


20, 
15. 
1+0, 


30, 


20, 
30, 
55. 


70, 


Uo, 
30, 
25. 


VI 
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SPECTRAL  ANALYSIS  SECTION 
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AUTOCORRELATIONS 


I.   Description 

AUTOCORRELATIONS  is  a  program  designed  to  perform  univariate  spectral 
analysis  of  time  series  data.   No  cross  spectral  calculations  are  made  by 
AUTOCORRELATIONS. 


II.   Input 


The  input  to  AUTOCORRELATIONS  may  be  from  cards  or  sequential  storage. 
Each  time  series  must  be  input  as  a  column  vector.   If  a  matrix,  each  column 
of  which  is  a  time  series,  is  input  to  AUTOCORRELATIONS,  a  univariate  analysis 
will  be  performed  on  each  column. 


III.   Usage 


The  parameter  string  for  AUTOCORRELATIONS  is 
Parameter 


Number 
1 
2 


Description 

Input  address.   Cards  or  Sequential  1-15. 

Minimum  number  of  lags  for  which  spectral  estimates 
are  to  be  calculated.   This  value  must  be  -  2. 


Maximum  number  of  lags  for  which  spectral  estimates 
are  to  be  calculated.  This  value  must  be  -  (series 
length  -  2) . 

Incremental  value  by  which  the  value  of  parameter  2 
steps  up  to  the  value  of  parameter  3. 

N-umber  of  lags,  including  0,  for  which  the  autoco- 
variances  are  desired.  If  this  value  is  less  than 
parameter  3,  then  it  is  ignored. 

An  eight  column  matrix  with  each  row  consisting  of, 
from  left  to  right,  the  frequency,  the  autoco variance 
for  the  corresponding  lag,  the  autocorrelation  co- 
efficient for  the  corresponding  lag,  the  raw  spectral 
estimate,  the  raw  spectral  density  estimate,  the 
smoothed  spectral  estimate,  the  smoothed  spectral 
density  estimate,  and  the  log  (base  10)  of  the 
smoothed  spectral  density  estimate  can  be  output  to 
Sequential  1-15 •   The  final  matrix  output  consists 
of  the  vertical  augmentation  of  all  the  submatrices 
generated  according  to  parameters  2-U. 


IX. AUTO. 2 

IV 

Printout 

The  printout,  all  of  which  is  by  default  in  AUTOCORRELATIONS,  consists 

of  the  autocovariances  and  autocorrelations  of  the  series  up  to  the  number 

of  lags  necessary  to  calculate  the  spectral  estimates  specified  in  pairameter 

3,  or  up  to  the  number  of  lags  specified  in  parameter  5  in  case  it  is  larger 

than  parameter  3,  the  mean  and  variance  of  the  series,  and  the  data  referred 

to  under  parameter  6  above. 

V. 

Calculations 

The  autocovariances  are  estimated  by 

=K  =  1  <i  ('<t-^»\.k-^' 

where 

^  =  I  ^  t=i  ^^t 

and  N  is  the  series  length. 

The  "raw"  spectral  estimates  R(f),  f=0,  1/21-1,  l/M,  3/2M,...,  1/2  are 

given  by 


M-1 


R(f)    =   2(1   +  2   Z  ^^'l  C^  cos   2Trfk) 


The   smoothed   spectral   estimates   are  given  by 
S(f)    =   2(1  +   2   E  J^I^  Cj^w^  cos   27Tfk) 

where  the  w    's  are  the  Tukey-Hanning   smoothing  weights,   and 

K. 
W^    =    —    (l    +    COSTTk/M) 

The  respective  densities  are  obtained  by  dividing  R(f)  and  S(f)  by  the  sample 
variance  of  the  series. 


VI .   Example 


Suppose  we  wish  to  obtain  the  spectral  estimates  for  a  series  of  length 
UOO  for  from  10  to  30  lags  in  steps  of  k   lags.   Furthermore,  suppose  the 
series  is  punched  on  cards  in  20FU.0  format.   A  possible  SOUPAC  program  would 
be: 


■IdOftTkiiAKdS^ 


IX. AUTO. 3 


/*ID   (accounting  information) 

/ /  EXEC  SOUP 

//SYSIN  DD  * 

MATRIX. 

M0VE(C)(S1). 

TRA(S1)(S2). 

ENDP 

AUT0(S2)(10)(30)(U). 

ENDS 

DATA(U00)(20FU.0) 


data  deck 


II 


END# 
/* 
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CROSPA 


I.   General  Description 

CROSPA  (Cross-Spectral  Analysis)  is  a  program  made  up  of  a  nimiber  of 
spectral  analysis  subroutines  originally  written  at  Princeton  and  which  has 
been  organized  and  adapted  to  the  SOUPAC  system  for  the  purpose  of  cross- 
spectral  analysis  of  multivariate  time  series.   The  SOUPAC  office  wishes  to 
thank  Dr.  R.  M.  Leuthold  and  Tom  Jarvis  for  their  assistance  in  obtaining  the 
subroutines  comprising  CROSPA. 


For  each  time  series  input  to  CROSPA,  the  autocovariances  up  to  a  number 
specified  by  the  user,  raw  spectral  density  estimates,  and  smoothed  spectral 
densities  are  calculated  and  printed.   A  cross-spectral  analysis  is  performed 
for  each  of  the  possible  pairs  from  those  time  series  input  to  CROSPA.   Cross- 
covariances,  raw  and  smoothed  cospectral  density  estimates,  raw  and  smoothed 
quadrature  spectral  density  estimates,  cross  amplitude  spectrum  density 
estimates,  gain,  phase  and  square  coherency  estimates  are  all  calculated  and 
printed. 


II .   Usage 


The  parameter  string  for  CROSPA  is 
Parameter 


Number 
1 
2 


Description 

Input  address.   CARDS  or  SEQUENTIAL  1-15- 

Input  address  for  filter  coefficients  (see  Section 
IV,  Filtering ,  below).   Blank,  Cards,  or  Sequen- 
tial 1-15 .   Usually  parameter  2  will  be  blank. 


Input  Array 


Number  of  lags  to  be  used  in  calculations.   To  per- 
form an  analysis  with  a  different  number  of  lags, 
CROSPA  must  be  called  again. 


The  time  series  must  be  input  to  CROSPA  as  columns  of  a  matrix.   Each 
time  series  in  the  matrix  must  be  of  the  same  length. 

IV,   Filtering 

In  some  cases  the  user  will  wish  to  filter  the  series  input  to  CROSPA. 
CROSPA  constructs  a  linear  filter  using  the  coefficients  at  the  address  given 
by  parameter  2  and  then  performs  all  the  subsequent  cross-spectral  analyses 
on  the  input  series  after  they  are  transformed  by  this  filter.   These  co- 
efficients must  be  stored  as  a  row  vector. 


As  a  final  step,  CROSPA  constructs  "recolored"  univariate  spectral  esti- 
mates for  the  original  time  series. 


The  filtering  option  should  only  be  used  with  caution  by  those  users 
experienced  with  spectral  analysis. 

V.   Comments 


VII 


The  raw  spectral  estimates  are  the  unweighted  finite  Fourier  transforms 
of  the  auto-  and  cross-  covariances.   The  smoothed  estimates  are  calculated 
with  Tukey-Hanning  weights. 

The  auto-  and  cross-covariance  estimates  are  calculated  using  n-p  as  the 
divisor,  where  n  is  the  number  of  observations,  and  p  is  the  number  of  lags. 


VI .   Example 


Suppose  that  the  matrix  of  time  series  resides  on  SI,  that  the  user  wishes 
to  input  the  filter  coefficients  0.5,  0,  -0.5  from  a  data  card  punched  in 
3F5-0  format,  and  that  20  lags  are  to  be  used  in  the  calculations.   Such  a  pro- 
gram might  be: 

/*ID     [accounting  information] 

//  EXEC   SOUP 

//FTllFOOl  DD     [information  to  define  Si] 

//SYSIN  DD   * 

CR0SPA(S1)(C)(20). 

ENDS 

DATA(3)(3F5.0) 

0.5   0  -0.5 
END# 
/* 
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SCALE  ANALYSIS  PACKAGE 
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CLIQUJi  ANALYSIS 


TI 


General  Description 

This  routine  is  designed  to  enumerate  all  third  order  or  higher 
interrelationships  (communication  chain)  which  exist  in  a  sociometric 
matrix.   The  algorithm  is  identical  to  the  method  described  by  Harary 
and  Ross.-^  A  communication  chain  is  considered  to  be  any  submatrix  of 
order  three  or  more  in  which  all  the  off  diagonal  cells  are  full. 

Restrictions 


The  maximum  dimensions  for  an  input  array  are  200  x  200.  Input  may 
come  from  cards  or  any  temporary  storage  area.   The  array  must  contain 
only  zeroes  and  ones  in  its  elements.   Any  number  greater  than  zero  is 
considered  to  be  one;  therefore,  care  should  be  used  in  constructing  the 
array.   Symmetry  in  the  input  matrix  is  not  necessary  since  the  program 
automatically  forces  symmetry  through  element-wise  products.   It  is 
suggested  that  TRANSFORMATIONS  be  used  to  modify  input  arrays  when  various 
cut-off  points  are  used  to  distinguish  ones  from  zeroes. 


III.   Parameters 


The  name  CLIQUE  ANALYSIS  appears  first  on  the  program  call  card  and 
is  followed  by  the  following  parameter: 


Parameter 
Number 


Use  or  Meaning 

Input  Address  of  data  array. 
CARDS  or  SEQUENTIAL  1-15- 


IV.   Special  Comments 


The  following  is  an  illustration  of  the  clique  detection  concept 
Data  matrix: 


0 

1 

1 

0 

0 

0 

0 

0 

0 

1 

0 

1 

0 

0 

0 

0 

0 

0 

1 

1 

0 

1 

1 

1 

0 

0 

0 

0 

0 

1 

0 

1 

0 

1 

0 

0 

0 

0 

1 

1 

0 

1 

1 

0 

0 

0 

0 

1 

0 

-) 

0 

1 

1 

1 

0 

0 

0 

1 

1 

1 

0 

1 

1 

0 

0 

0 

0 

0 

1 

1 

0 

1 

0 

0 

0 

0 

0 

1 

1 

1 

0 

X.CLI.2 


Clique  (1)  1,  2,  3 

Clique  (2)  8,  6,  7,  9 

Clique  (3)  h,  3,  5 

Clique  (k)  3,  5,  6 

Clique  (5)  h,  5,  7 

Clique  (6^  5,  6,  7 


Tlarary  and  Ross,  "A  Procedure  for  Clique  Detection  Using  the  Group  Matrix", 
Sociometry,  Vol.  20,  No.  3,  1956,  pp.  2-  5,  215- 


PAIRED  COMPARISONS 


I.   General  Description 


Paired  comparisons  is  a  method  of  obtaining  empirical  estimates  of 
the  form  "stimulus  j  is  judged  greater  than  any  other  stimulus  i."   Each 
stimulus  in  turn  serves  as  the  standard;  that  is,  all  possible  pairs  of 
stimuli  are  compared.   With  n_  stimuli,  there  are  n  (n-l)/2  pairs. 
Comparisons  of  a  stimulus  with  itself  is  disregarded;  it  is  assumed 
that  a  proportion  of  0.50   would  result.   In  the  following  m  =  no.  of 
subjects  =  sample  size. 

Each  subject's  preferences  are  tabulated,  and  the  total  number  of 
times  he  preferred  each  stimulus  is  computed  producing,  A  ,  an  n  x  n 
matrix  of  I's  and  O's  ,  k  =  1,  m.   Totals  for  each  stimulus  and  a  grand 
total  are  computed  for  each  subject  and  this  m  x  n  +  1  matrix  is 
referred  to  as  individual  preference  sums.   The  individual  tables,  A  , 
are  summed  over  all  subjects  to  form  an  n  x  n  frequency  matrix  F, 
whose  elements  (fij)  denote  the  observed  number  of  times  stimulus  j  was 
judged  greater  than  stimulus  i. 


II, 


The  matrix  of  proportions,  P,  is  then  computed  from  F,  so  that  p.  , 
is  the  observed  proportion  of  times  stimulus  j  was  judged  greater  than"^ 
stimulus  i.   The  matrix  X  is  derived  from  P  by  reference  to  the  normal 
curve;  x. .  is  the  unit  normal  deviate  corresponding  to  the  element  p... 
These  are  the  sample  estimates  of  the  values  required  to  determine  the 
scale  values  of  the  stimuli.   The  scale  values  are  computed  by  summation 
producing  s . ,  a  least  squares  estimate  of  the  scale  value  of  stimulus  j . 


J 


Input 


Both  an  indication  of  ordering  for  each  pair  and  an  array  of 
subjects'  choices  are  required.   The  former  must  be  given  as  a  set 
of  pair  subparameter  cards  and  the  latter  as  an  observation  of 
data  for  each  subject  in  a  data  deck. 

In  the  subjects  deck  one  number  is  used  to  denote  the  subject's 
choice  for  each  pair.   This  choice  may  be  "is  greater  than," 
"is  better  than,"  "is  brighter  than,"  etc.   This  number  is  1  if 
the  subject  chose  the  left,  or  first  stimulus,  2  if  the  subject 
chose  the  right  or  second  stimulus.   No  other  coding  is  acceptable 


C.   The  pair  cards  consist  of  one  mention  each  of  every  possible  pair 
of  stimuli.   The  order  of  the  pairs  is  the  same  as  the  order  of 
the  subjects'  choices,  i.e.  pair  1  corresponds  to  item  1  of  the 
subject  array.   The  order  of  the  elements  in  the  pairs  is  reflected 
in  the  subjects'  choice  deck,  if  (5»T)  corresponds  to  a  1  then 
the  subject  chose  stimulus  5  over  stimulus  7)  if  (7 55)  were  the 
pair,  a  2  corresponding  would  mean  stimulus  5  preferred.   Note  that 
one  set  of  pair  specifications  serves  for  all  subjects. 


X.PAI.2 


III.   Formulas  and  Calculations 


A.   INDIVIDUAL  PREFERENCE  SUMS 


Let  A  be  an  individual  preference  frequency  table,  a.  ,  is  an 
element  of  A,  i=l,  n,  j=l,  n,  where  n  is  the  number  or  stimuli 


a.  .  =  1 


if  an  individual  chose  stimulus  j  over 
stimulus  i. 


a.  .  =  0 


a.  .  =  a. .  =  0 
11    JJ 


if  the  individual  chose  stimulus  i  over  j, 


no  stimulus  is  compared  with  itself. 


Individual  preference  sum  for  stimulus  j  =  Z  a. . 

i   ^J 


Error  messages  concerning  incorrect  frequency  tables  refer  to  the 
configurations  of  Table  A.  A  can  be  correct  only  if  subject  data 
and  pair  cards  are  correct . 


STIMULUS  PREFERENCE  FREQUENCY  TABLE,  F 
Given  the  matrix  A  for  each  of  m  subjects 

Stimulus  preference  frequencies  =  f 


ij 


m 

Z  a.  . 
k=l  ^J 


TABLE  OF  PROPORTIONS,  P 


If  m  is  the  sample  size,  i.e.  number  of  subjects,  then, 

f. 
P. 


ij 


1^ 
m 


D.   TABLE  OF  NORMAL  DEVIATES,  Z 


Let  p. .  be  an  element  of  the  table  of  proportions 


Then  let 


/" 


ij 


log  (1/p. .  ) 


e.  .  - 


2.51551T  +   .802853xe..+  .010328xe7 

1.1  1 


i 


-J 


^■^   1.  +  l.U32788xe.  .+  .l89269xe..+  .001308xe^. 


producing  a  z   for  each  e...   Critical  values  of  p  occur  at  0 ,  1 
and  .  5  so  adjustments  are'^made  for  these  values  before  the  formula 
is  applied  and  sometimes  after. 
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E.   SCALE  VALUES,  S 

Zz.  . 


s  .  = 
J 


,  where  n  is  the  number  of  stimuli 


Z  s 
Then  total  scale  =  .   ,1 

J 

A  row  of  scales  and  a  total  of  length  n  +  1  is  calculated. 

IV.   Output 

Matrices  for  individual  preference  totals  and  S,  scale  values,  are 
always  printed.   Other  intermediate  results  F,  P  and  Z  may  "be  printed 
on  option.   All  matrices  may  be  stored  on  option.   All  results  are  printed 
in  F  format.   F,  P  and  Z  are  n  x  n  matrices,   individual  totals  and  S 
are  m  x  n  +  1  and  n  +  1  respectively   (see  Section  III  for  calculations). 
The  A  matrix  is  printed  only  in  an  error  situation. 

V.   Restrictions 

A.  The  maximum  number  of  stimuli  is  U2.      There  is  no  restriction  on  the 
number  of  subjects.   Each  stimulus  must  be  paired  with  each  other  one. 
Subjects  should  have  complete  data. 

B.  No  more  than  300  pairs  should  be  specified  per  pairs  statement. 
Additional  pairs  statements  may  be  inserted  to  a  maximum  of  88l 
pairs  {k2   stimuli). 

C.  Caution:  The  number  of  stimuli,  n,  and  the  number  of  pairs,  q, 
are  in  the  relation 

n(n-l) 

^  =-2 • 

Any  other  relationship  is  invalid. 

D.  Note  if  (5,7)  is  a  pair,  then  (7,5)  is  invalid.   Also,  if  this  is 
the  first  pair  then  the  subject's  first  choice  specification  concerns 
stimulus  5  vs  stimulus  7;  (5,5)  is  invalid. 

VI .   Parameters 

After  the  program  name,  PAIRED  COMPARISONS,  on  the  call  card  come 
the  parameters  in  the  following  order: 


Parameter 
Number 


Use  or  Meaning 


Input  Address  of  data.   CARDS,  SEQUENTIAL  1-5- 

Output  Address  of  individual  preference  sums. 
Always  printed.   SEQUENTIAL  1-5- 

Q,   Output  Address  of  stimulus  preference  frequency  table 
SEQUENTIAL  1-5  and/or  PRINT;  if  not  desired,  leave 
parameter  blank. 
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Parameter 
Number 


Use  of  Meaning 

n  Output  Address  of  proportions.   SEQUENTIAL  1-5  and/ 
or  PRINT;  if  not  desired,  leave  parameter  blank. 

Q   Output  Address  of  normal  deviates.   SEQUENTIAL  1-5 
and/or  PRINT;  if  not  desired,  leave  parameter  blank. 


Scale  Values.   SEQUENTIAL  1-5,  always  printed. 


Q     It  is  possible  to  punch  the  output  from  these  parameters  while 

executing  this  program.   If  you  need  this  option,  see  the  section  in 
the  Introduction  on  Input  and  Output.   Any  storable  output  may  be 
punched  using  the  Matrix  program. 

VII.   Examples 

A. 

/*ID  <accounting  information> 

//  EXEC   SOUP 

//SYSIN  DD   * 

PAIRED  COMPARISONS  (C)(  )(P)(P)(P). 

PAIRS  (1,2)(3,1)(U,1)(3,2)(2,U)(U,3). 

END  P 

END  S 

DATA(6)(6F1.0) 

122211 

222111 

121122 


I 


END  # 

Print  has  been  indicated  for  all  output  except  individual  preference 
sums  and  scale  values  which  are  always  printed. 

The  pairs  card  indicates  that  there  are  h   stimuli.   All  possible 
pairs  of  these  stimuli  are  presented  to  the  subjects,  and  the  subject's 
responses  are  recorded  in  the  order  (l,2),  (l,3),  (l,U),  (2,3),  (2,U), 
(3,^).   Some  of  these  pair  members  have  been  inverted  indicating  that 
no  special  order  is  required,  left  member  or  right  member  preference  of 
subjects  would,  of  course,  be  affected  by  the  inversion. 

The  pairs  need  not  be  given  in  the  increasing  order  of  the  example, 
but  at  all  times  the  order  of  the  pairs  is  the  order  of  the  corresponding 
Subject  responses. 

The  data  deck  is  a  set  of  subject  responses  for  each  pair  of  stimuli. 
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B. 


/*ID<accounting  information> 

//  EXEC  SOUP 

//SYSIN  DD  * 

TRA  (C). 

C0N(900)*1*. 

ADD  (l,625)(900)(l,625). 

OUT( SI) (1,625). 

END  P 

PAI(S1)(  )(S2/P)(  )(P). 

PAI(5,3)(2,8)(1+,16)(T,26)(8,3)( )(U,13). 

PAI(25,2U)(23,28)( )(U,12). 

END  P 


END  S 
DATA(325)(T5F1.0) 

Data  Deck — 5  cards  per  subject 
END# 
/* 


This  example  shows  a  program  for  26  stimuli,  26  x  25/2  =  325  is 
the  number  of  pairs  required  and  the  number  of  subject  preferences. 
Since  no  more  than  300  pairs  may  be  given  per  pairs  statement,  at  least 
two  pairs  statements  are  needed;  two  are  shown.   The  unique  pairs  may- 
occur  in  any  order,  the  subject  responses  are  in  the  same  order. 

The  TRANSFORMATIONS  program  shown  is  designed  to  correct  subject 
responses  punched  zero/one  or  blank/one  to  1  and  2. 

A  selection  of  possible  output  has  been  made.   Note  that 
individual  preference  sums  and  scale  values,  as  well  as  normal  deviates 
and  stimulus  preference  frequencies  are  printed.   The  latter  is  also 
stored.   This  storage  implies  some  further  use  is  made  of  the  frequencies, 
perhaps  in  the  missing  part  of  the  program. 


VIII.   References 


Torgerson,  Warren  S.   Theory  and  Methods  of  Scaling.   John  Wiley  and 
Sons,  New  York;  I96O,  pp.  166-173. 
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SCALOGRAM  ANALYSIS 


T.   GENERAL  DESCRIPTION 

The  SCALOGRAM  ANALYSIS  (mnemonic:   SCA)  was  developed  to  provide  a 
method  of  producing  Guttman  scales  automatically  without  the  need  of  ex- 
ternal decisions  to  determine  which  items  do  and  which  items  do  not  enter 
into  Guttman  scales.  Items  are  grouped  together  in  as  few  as  possible  sub- 
matrices  with  each  subgroup  having  a  maximum  homogeneity  within  each  sub- 
matrix.   Each  item  from  the  total  group  is  chosen  to  fit  into  only  one  sub- 
matrix. 

The  SCALOGRAM  program  is  started  by  choosing  an  item  from  the  total 
group  and  then   searches  the  remainder  of  the  items  to  find  an  item  similar 
to  the  item  chosen.   Similarity  is  tested  by  using  an  error  criteria  and  a 
chi-square  test  to  insure  that  the  items  are  similar.   If  the  above  criteria 
are  met,  this  item,  is  added  to  the  first  item  and  a  scale  is  formed.   This 
last  item  is  then  used  to  find  another  similar  item  and  this  procedure  con- 
tinues until  either  of  the  two  criteria  is  not  met.   Whenever  a  criteria 
fails,  the  scale  is  terminated  and  a  new  scale  is  started. 

SCALOGRAM  will  only  work  for  dichotomous  data  and  it  can  be  used  to 
analyze  both  subject-wise  and  item-wise.   SCALOGRAM  differs  from  Guttman 
analysis  in  three  ways:   l)  It  uses  an  empirical  rather  than  a  rational 
basis  for  selecting  items  to  enter  a  scale;  2)  It  uses  a  statistical  method 
of  deciding  on  groups  and  for  testing  the  scale-ability  of  the  item; 
3)  It  yields  multiple  scales  rather  than  reject  the  scale  hypothesis  for 
the  whole  item  set. 

SCALOGRAM  can  be  considered  to  be  more  descriptive  than  the  raw  data 
but  less  than  factor  analysis.   SCALOGRAM  also  is  unlike  factor  analysis  in 
that  SCALOGRAM  is  not  bound  to  linear  assumptions  about  the  regressions  in- 
volved.  Factor  analysis  is  set  up  to  study  quantitative  variables  and  will 
not  show  correct  relationships  between  qualitative  variables,  SCALOGRAM  will 
show  what  relationships  do  exist  between  qualitative  variables.   (See 
Guttman  1950  for  a  complete  discussion  of  the  relation  between  the  scalogram 
technique  and  other  statistical  procedures.)   (See  Lingoes  I963  for  the 
complete  algorithm  for  SCALOGRAM.) 

II.   REFERENCES 

Guttman,  L.   "Relation  of  Scalogram  Analysis  to  Other  Techniques."  In  Samuel 
A.  Stouf f er ,  et  al . ,  Measurement  and  Prediction.   Princeton,  N.J.:   Princeton 
University  Press,  1950  (pp.  172-212). 

Lingoes,  J.C.  "Multiple  Scalogram  Analysis.   A  Set-Theoretic  Model  For  Analyz- 
ing Dichotomous  Items."  Educational  and  Psychological  Meausrement  XXIII  (1963), 
5OI-52U. 


Lingoes,  J.  C.  "A  Multiple  Scalogram  Analysis  of  Selected  Issues  of  the  83rd 
U.S.  Senate."  American  Psychologist,  XVII  (1962),  327. 
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III.   INPUT 

Input  to  the  SCALOGRAM  ANALYSIS  Prograin  consists  of  a  rectangular  array 
of  dichotomous  variables.   Scaling  is  done  on  columns.   Thus  an  N  x  M  array 
of  data  will  be  scaled  across  the  N  "subjects"  and  yield  up  to  M  scales. 
To  scale  on  the  M  "items"  set  the  transpose  flag  in  SCALOGRAM  (see  param- 
eters) . 

Zero  and  one  are  the  usual  values  of  the  dichotomy.   Two  is  taken  as 
missing  data  and  distributed  randomly  among  the  other  codes.   Blanks  are 
zeros.   In  general  2  is  missing  data,  1  is  one  level  of  the  dichotomy  and 
"anything  else"  is  the  other  level  of  the  dichotomy. 

Labels  may  consist  of  up  to  28  characters,  one  label  per  card.   If 
both  labels  and  data  are  read  by  SCALOGRAM  from  cards,  the  labels  deck 
goes  first.   If  SCALOGRAM  is  to  scale  subjects,  labels  must  refer  to  sub- 
jects.  Note  if  labels  are  not  used  considerably  more  core  is  available 
for  variables  and  subjects.   If  labels  are  used  the  data  card  should  be: 
DATA(n)(nAU)  where  n  ^  7- 

IV .   PARAMETERS 

The  program  mnemonic  is  SCA.  The  following  parameters  appear  on  the 
program  card: 


Parameter 
1 
2 
3 


Description 

Input  Address 

Address  of  Labels 

A  1_  indicates  that  the  matrix  should  be 
transposed 


Since  the  program  scales  by  columns  or  items,  to  scale  by  subjects, 

indicate  in  parameter  3. 

V.   RESTRICTIONS 

Data  must  be  dichotomous  (see  input  section).   If  data  is  not  of  this 
form,  TRANSFORMATIONS  may  be  used  to  recode  it. 

VI .   EXAMPLES 

SCA(C)(C)(1). 

ENDS 

DATA(7)(TAU) 

'.  (labels) 

END# 

DATA(i+0)(l|0F1.0) 
I     (data) 


END# 
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Labels  and  data  are  on  cards,  28  columns  are  used  for  labels  and  scaling 
will  be  done  by  rows. 

SCA(Sl). 

ENDS 

DATA(30)(30F1.0) 

( dat  a ) 


END# 


Data  is  on  SEQUENTIAL  1  and  scaling  will  be  done  by  columns, 


l-.v> 


v-x- 


PROBIT  ANALYSIS  SECTION 
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PROBIT 


(Mnemonic:   PRB) 


General  Description 

This  program  calculates  maximum  likelihood  estimates  for  the  parameters 
A  and  B  in  the  probit  equation: 


Y 


A  4  BX 


An  iterative  scheme  is  used. 


II 


Restrictions 


The  input  vectors  must  be  equal  length  k  and: 
input  vector  comes  from  a  separate  input  address. 


3  <  k  <  3000, 


Each 


III.   Parameters 


Parameter 
Number 


Use  or  Meaning 


Input  vector  of  dosage  level. 
SEQUENTIAL  1-15- 


CARDS  or 


Input  vector  of  number  of  subjects  tested 

at  each  dose  level.   CARDS  or  SEQUENTIAL  1-15- 

Input  vector  containing  the  number  of 
subjects  -at  each  level  responding  to  the 
drug.   CARDS  or  SEQUENTIAL  1-15- 

Output  vector  of  length  k  containing  the 
proportion  of  subjects  responding  to  the 
various  close  levels  of  the  drug.   SEQUENTIAL  1-15, 
and/or  PRINT. 

Output  vector  of  length  k  containing  the  values 
of  the  expected  probit  for  the  various  levels  of 
the  drug.   SEQUENTIAL  1-15  and/or  PRINT. 


Printed  output  consists  of: 

1  -  Estimate  of  intercept  constant  A 

2  -  Estimate  of  probit  regression  coefficient  B 

3  -  Chi-square  value  for  a  test  of  significance  of  final 

probit  equation 


X2  = 


R.  -  N.P. 
1    11 


,2 


r=i  %Pi^i 


^IT 


where  Rj_  ^  number  of  responses  (input  address  3) 

Ni  -   number  of  objects  tested  (input  address  2) 
Pj_  =  cumulative  normal  distribution  values  corresponding 
to  Zj_  where  Zi  =  (A  +  BXi)  -  5 
where  A  and  B  are  from  final  probit  equation 
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k  -   Degrees  of  freedom  for  X^ 
d.f.  =  k  -  2 

References: 

D.  J.  Finney,  Protiit  Analysis,  Second  Edition,  (Cambridge  University  Press 
1952). 

The  program  was  adapted  from  the  IBM  Scientific  Subroutine  Package, 
36OA-CM-O3X,  Version  III,  page  UU. 

IV.   Example 

If  two  or  more  input  addresses  are  cards,  the  cards  must  be  stacked 
in  order  of  their  parameter  numbers.   For  example: 

/*ID 

//   EXEC   SOUPAC 

//SOUPAC.SYSIN  DD   * 

MAT. 

MOVE ( CARDS )(SEQ2) 

END  P 

PRE ( CARDS ) ( SEQ2 ) ( CARDS ) (PRINT ) . 

END  S 

DATA(1)( ) 

:     cards  for  SEQ  2 

END# 

DATA(1)( ) 

Cards  for  Parameter  1 

END# 

DATA(l)( ) 


Cards  for  Parameter  3 


END# 
/* 


NOTE   The  mnemonic  for  PROBIT  is  PRB,  nor  PRO, 


RANDOM  NUMBER  GENERATION  SECTION 


RANDOM  NUMBER  GENERATOR 


I .   General  Description 

This  program  generates  a  matrix  of  random  niimbers  or  digits  from  a 
specified  probability  distribution. 

;i.   Main  Parameter  Card 


Parameter 
Number 


3 
1+ 
5 

Subparameter  Cards 


Description 

Input  Address  of  9  (nine)  digit  integer,  used  as  a 
starting  point  for  the  random  nijmber  generator. 
CARDS  or  SEQUENTIAL  1-15. 


Output  Address  of  random  numbers  matrix. 
1-15.   PRINT  is  not  valid. 


SEQUENTIAL 


Number  of  rows  in  output  matrix  of  random  numbers . 

Number  of  columns  in  output  matrix  of  random  numbers , 

Output  Address  of  9  digit  integer  which  is  finishing 
point  of  the  random  number  generator.   Do  not 
specify  PRINT  since  number  is  automatically  printed. 


To  specify  the  distribution  of  the  random  numbers,  choose  one  of  the  sub- 
parameter  cards  listed  below.   Refer  to  Appendix  E  for  definition  of  these  dis- 
tributions, as  well  as  information  on  obtaining  distributions  not  given  below. 


BINOMIAL  (N)*P* 


The  sum  of  N  independent  trials ,  each  with  probability 
P  of  success. 


RECTANGULAR  *01**e2*, 


NORMAL  *N**a2*, 


GAMMA  *a**3 


DISCRETE (n). 


Sometimes  called  the  continuous  uniform  distribution. 
The  probability  of  each  interval  in  [01,02]  of  fixed 
size  is  the  same. 

Here  N  is  the  desired  mean  of  the  normally  distributed 
variables,  and  O'^   the  variance. 

The  Gamma  distribution  with  parameter  3  and  a  degrees 
of  freedom,   a  should  be  integer  valued.   Note  that 

Gamma  *  —  **  6*.  is  the  chi-square  distribution  with 
n  degrees  of  freedom,  while  Gamma  *1.**X*.  is  sometimes 
called  the  negative  exponential  distribution  with 
parameter  X. 

Yields  random  digits  from  0  to  N,  each  with  equal 
probability  of  occuring. 
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IV.   Special  Comments 

If  this  program  is  used  with  the  same  integer  starting  point,  it  will 
generate  the  same  numbers.   Thus,  use  Parameter  5  to  output  the  finishing 
location,  and  then  pass  that  address  as  the  starting  location  for  the  next 
use  of  this  program. 

It  should  be  noted  that  time  requirements  for  generating  random  numbers 
will  vary  greatly  .among  the  various  distributions.   More  specific  information 
is  available  from  the  SOUPAC  Office. 

V.   References 


IBM  System/360  Scientific  Subroutine  Package  (360A-CM-03X)  Version  2, 
page   5^- 
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UTILITY  PROGRAM 

I.   General  Description 

The  UTILITY  program  has  been  designed  to  handle  small  utility  functions 
which  do  not  necessitate  or  justify  the  creation  of  a  unique  program  within 
the  SOUPAC  system.   The  following  statements  will  invoke  the  UTILITY  program. 

UTILITY. 

(insert  subparameter  card  or  cards  here 
END  P 
The  following  sections  describe  the  functions  of  the  various  subparameters . 

II.   PRESORT  Program 

The  PRESORT  Program  is  presently  the  only  program  in  the  UTILITY  program. 
It  is  used  to  set  up  the  data  cards  to  be  input  into  the  IBM  SORT/MERGE 
package  which  will  be  executed  following  the  present  SOUPAC  program  and 
before  another  SOUPAC  program  which  will  use  the  sorted  data  for  an  input. 

SORT  (0  or  1)(0  or  l){Y^) (V^). 

Parameter 
Number                 Use  or  Meaning 

1                   0  if  data  is  to  be  sorted 

1  if  data  is  to  be  sorted 

in  ascending  order. 
in  descending  order. 

2                   0  if  data  to  be  sorted  is 

1  if  data  to  be  sorted  is 

in  single  precision, 
in  double  precision. 

3  through  n  <  20 


indicates  the  variable  or  variables  to  be  sorted 
with  the  later  variables,  if  any,  varying  most 
rapidly. 


SOUPAC  program  in  which  the  UTILITY  program  appears, 
.  must  appear : 


Following  the  owui^u  jjl^^j. 
the  following  card  must  appear 

//  EXEC  SOUPSORT,INPUT=Snn,OUTPUT=  Smm. 
where  nn  and  mm  represent  the  two  digit  equivalent  of  the  sequential  unit 
numbers  to  be  input  to  the  sort  and  output  from  the  sort  to  the  next  SOUPAC 
program.   The  two  units  must  not  be  the  same. 

The  next  card  will  start  the  next  SOUPAC  program  which  will  operate 
under  the  assumption  that  the  sorted  data  has  been  supplied  on  the 
specified  sequential  unit  in  the  output  of  the  SOUPSORT  program. 

//  EXEC  SOUPAC, DISP=OLD 
//SYSIN  DD   * 

(Your  program  which  uses  the  sorted  data). 
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The  example  given  below  is  for  sorting  cards  input  data  so  that  it 
may  be  input  into  a  FREQUENCY  program  which  uses  variable  5  as  a  control 
variable . 

/*ID  identification  card  information 

//   EXEC   S0UPAC 

//SYSIN  DD   * 

MATRIX. 

M0VE( CARDS) (Si). 

END  PR0GRAM 

UTILITY. 

S0RT(O)(1)(5). 

END  PR0GRAM 

END  S0UPAC 

DATA(10)(10F5.0). 

(user's  data  deck) 

END  # 

//   EXEC   S0UPS0RT,INPUT=SO1,0UTPUT=SO2 

//   EXEC   S0UPAC,DISP=0LD 

//SYSIN  DD   * 

FREQUENCY ( S2 ) . 

T¥0. 

PER(1)(1)(1). 

C0NTR0L(5). 

END  PR0GRAM 

END  S0UPAC 

/* 

The  data  on  SI  is  sorted  in  ascending  order  on  variable  5-   The  data  is 
passed  to  the  SOUPSORT  job  step  on  SI  in  double  precision.   This  data  is 
sorted  on  variable  5  and  then  output  onto  S2  in  double  precision.   It  is 
then  input  into  the  FREQUENCY  program  of  the  next  SOUPAC  job  step,  whereupon 
analysis  continues. 

III.   Notes,  Restrictions,  and  Ideas 


1.  The  default  output  from  MATRIX  is  in  double  precision 

2.  The  output  from  TRANSFORMATIONS  is  in  single  precision 

3.  Only  one  utility  program  is  allowed  per  SOUPAC  program 

h.      If  other  sequential  units  have  been  used  during  the  first  SOUPAC 

prograjn  besides  the  one  passed  to  the  sort  job  step,  they  are  still 
intact  and  usuable  in  the  second  SOUPAC  program  due  to  the  DISP=OLD 
parameters . 
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OPTIONS  TO  A  SOUPAC  JOB: 
FARMS,  PROLOG  CARDS,  AND  $-CONTROL  CARDS 


A. 


FARMS 


Farms  are  argioments  to  the  keyword  'Parameter,'  contracted  into 
the  keyword  'FARM',  which  give  instructions  to  a  processor  rimning  under  a 
360-system.   In  this  context,  SOUPAC  is  a  processor  running  under  a  36O 
system.   FARMS  are  always  coded  on  an  EXEC  card  and  have  the  following  form: 


/ /   EXEC  SOUPAC , PARM= ' OPTl , 0FT2 , 


,OFTm' 


The  permissable  options  to  "be  used  as  SOUPAC  FARMS  are  listed  below  with  an 
explanation  of  their  use  and  function.   Note  that  the  default  is  underlined, 
that  is,  //  EXEC  SOUPAC  is  equivalent  to  //  EXEC  SOUPAC ,FARM= 'OPTl, 0FT2. . , 
where  the  underlined  FARM  is  to  be  taken  as  one  of  the  list  of  options  in  the 
FARM  string  in  the  example.   These  FARMS  give  the  SOUPAC  system  instructions 
in  the  same  way  that  parameters  give  SOUPAC  statistical  or  data  management 
programs  instructions. 

1.  NODYNAM  or  DYNAM 

NODYNAM  implies  that  a  non-dynamic ally  allocatable  version  of  the 
library  of  statistical  procedures  is  to  be  used.   This  version  will 
run  in  some  150K  of  core  and  will  handle  a  lesser  number  of 
variables  than  the  dynamically  allocatable  version.   DYNAM  will  use 
the  dynamically  allocatable  version  of  any  program  requested  which 
will  handle  more  variables  in  an  arbitrarily  specified  amount  of 
core  above  a  certain  minimum.   If  using  DYNAM,  see  the  SOUPAC 
consultants  for  a  handout  on  optimal  region  sizes  for  particular 
numbers  of  variables. 

2.  EXECUTE  or  NOEXECUTE 

NOEXECUTE  implies  that  the  SOUPAC  parameter  deck,  for  which  the 
Syntax  Interpreter  is  to  scan  and  build  intermediate  parameters, 
should  not  be  executed.   NOEXECUTE  indicates  that  only  a  syntax 
check  is  to  be  performed.   If  EXECUTE  is  specified  and  no  errors 
are  found  by  the  Syntax  Interpreter,  the  job  step  will  proceed. 
If  EXECUTE  is  specified  and  errors  are  found  by  the  Syntax  Inter- 
preter, execution  of  the  step  may  continue  depending  upon 
whether  LET  or  NOLET  is  also  specified. 

3.  NOLET  or  LET 

If  an  error  is  found  by  the  Syntax  Interpreter  and  EXECUTE  has 
been  specified,  execution  will  proceed  only  if  LET  was  also 
specified.   In  this  case,  execution  will  proceed  only  through  the 
last  program  processed  which  was  completely  error  free.   If  NOLET 
was  specified  and  errors  are  found  by  the  Syntax  Interpreter, 
execution  will  not  be  permitted. 

k.      LIST  or  NOLIST 

LIST  indicates  that  all  program  cards  are  to  be  listed.   NOLIST 
indicates  that  only  the  prolog  section  of  the  SOUPAC  parameter 
deck  is  to  be  listed. 
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B. 


5.   PGM  or  NOPGM 

PGM  indicates  that  a  complete  SOUPAC  parameter  deck  and  data  decks 
follow.   NOPGM  indicates  that  only  the  prolog  section  and  data 
deck  follow,  and  that  the  intermediate  parameters  are  being  provided 
by  the  user  by  over-riding  the  cataloged  procedure.   This  implies 
that  the  user  has  previously  run  a  SOUPAC  job  and  has  saved  the 
two  necessary  data  sets  so  that  he  may  run  the  same  program  again. 
To  perform  this  saving  of  data  sets  correctly,  a  user  should  visit 
the  SOUPAC  office  first  to  ensure  it  is  done  correctly. 

If  any  error  is  found  by  the  Syntax  Interpreter  in  the  prolog  section, 
the  job  step  will  not  continue. 

If  the  job  step  which  generated  the  intermediate  parameter  data 
sets  found  syntax  errors,  execution  of  the  job  step  in  which 
NOPGM  is  specified  will  continue  (if  EXECUTE  is  specified)  through 
the  last  program  processed  which  was  completely  error  free  regard- 
less of  whether  LET  or  NOLET  was  specified  in  either  job  step. 

Examples : 

To  do  just  a  syntax  check: 

/ /   EXEC   SOUPAC , PARM= ' NOEXECUTE ' 
To  execute  up  to  the  first  program  foxind  to  have  syntax  errors:      i 

//   EXEC   SOUPAC ,PARM=' LET'  ■ 

To  execute  up  to  the  first  program  found  to  have  syntax  errors  and 
use  the  dynamically  allocatable  library: 

//   EXEC   SOUPAC,PARM='LET,DYNAM' . 

Note  that  the  PARMS  may  be  listed  in  axiy  order. 

PROLOG  OF  A  SOUPAC  JOB 


Described  below  are  several  #  control  cards  which  may  appear  in  the 
prolog  of  a  SOUPAC  job.   Within  the  prolog  these  control  cards  may 
appear  in  any  order.   If  prolog  control  cards  are  used,  they  must  appear 
immediately  after  the  SYSIN  card.   The  Syntax  Interpreter  determines 
the  end  of  the  prolog  when  it  reads  a  card  which  is  not  one  of  these 
types.   All  types  have  parameters  and  must  be  terminated  bv  a  period. 
Prolog  cards  may  not  have  continuation  cards,  hence  all  parameter 
information  must  be  punched  within  80  columns.   There  is  no  limit 
to  the  number  of  prolog  cards  permitted  nor  is  there  any  restriction  on 
the  number  of  any  one  type.   If  conflicting  information  is  entered,  the 
information  entered  last  overrides  any  previous  definitions. 

1.   f/REPEAT  OPTION 

The  ^REPEAT  OPTION  is  used  to  repeat  sections  of  a  SOUPAC 
parameter  deck  an  optional  niomber  of  times.   The  ^REPEAT  card 
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which  appears  in  the  prolog  section  will  be  followed  by  up  to 
22  (twenty-two)  integer  parameters  which  will  indicate  the  number 
of  repetitions  of  up  to  22  repeat  sequences.   The  card  sequences  to 
be  repeated  will  be  preceded  and  followed  by  /S'SREP  and  #EREP 
cards  respectively.   Example: 

/*ID 

//  EXEC  SOUP 

//SYSIN  DD   * 

^REPEAT  (2). 

<additional  program  cards> 

#SREP 

CORRELATION  (C)(  )(Sl). 

SQUAEE  ROOT  FACTOR  ANALYSIS  (Sl ) (P(F ) ) (20) (C ) (P(F ) ) . 

#EREP 

END  S 

In  this  example  the  program  sequence  of  CORRELATION  and  SQUARE  ROOT 
FACTOR  ANALYSIS  will  be  repeated  twice.   Four  card  input  data 
sets  would  be  required  for  the  repeated  sections. 

Repeat  sequences  which  begin  before  a  main  program  and  end  in 
a  subprogram  or  which  begin  in  a  subprogram  and  do  not  end  in  the 
same  subprogram  are  not  allowed.   Nested  or  overlapping  repeat  sequences 
are  not  allowed.   Also  a  #SREP  card  cannot  be  immediately  followed 
by  a  #EREP  card  and  a  single  appearance  in  the  deck  of  either  card 
will  cause  an  error. 

2.   ^V-UNIT  OPTION 

#V-UNIT  allows  the  user  to  change  input  and  output  addresses 
in  the  execution  of  one  SOUPAC  job.   The  form  of  a  #V  is  as  follows: 

#Vn  (m)  (A^) (A^). 

where  n  is  an  integer  1  through  9i   thus  there  can  be  at  most  9 
variable  addresses,  namely  VI  through  V9;  and  m  is  a  counter  which 
determines  how  many  times  a  variable  address  may  be  used  before  it 

assumes  the  next  value  in  its  list  of  possible  values.   A]_ A-^^ 

are  addresses  which  Vn  assumes.   These  can  be  any  valid  address. 
At  the  moment,  however,  forms  like  (Sl/P)  will  not  work.   Note  that 
CARDS  and  PRINT  are  permitted. 


Finally,  the  list  of  addresses  is  cyclic;  that  is,  if,  after 
Aj^  has  been  used,  Vn  occurs  again  in  the  program,  Vn  will  have  the 
value  A-^,   and  so  on. 

/*ID 

//   EXEC  SOUP 

//SYSIN  DD   * 

#V9(l)(Sl)(S2)(S3)(Sl+). 

#V5(1)(S1)(S2)(S3)(SU). 

#REPEAT  {h). 

MAT. 
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( Example,  Continued) 

#SREP 

MOV  (C)(V9). 

#EREP 

HOR  (V5)(V5)(V5)(V5)(S5) 

END  P 


END  S 

This  program  segment  reads  k   separate  card  decks,  saving  them  in 
temporary  storage,  and  horizontally  augments  them  into  one  data  set. 

The  equivalent  without  the  use  of  ^REPEAT  and  #V  would  be  as  follows 

/*ID 

//   EXEC   SOUP 

//SYSIN  DD   * 

^4AT. 

M0V(C)(S1)  . 

M0V(C)(S2)  . 

M0V(C)(S3)  • 

M0V(C)(Sl+)  • 

H0R(S1)(S2)(S3)(SU)(S5). 

END  P 


END  S 

Note  that  the  M0V(C)(V9).  statement  is  expanded  into  four  move  statements 
and  V9  takes  the  values  SI  through  SU.   Similarly,  V5  takes  on  the 
values  of  SI  through  SU. 

3.  #0LD  OPTION 


The  #OLD  option  is  used  to  define  the  number  of  rows  in  a 
sequential  data  set  created  by  a  previously  run  SOUPAC  job.   The 
number  of  rows  is  then  entered  into  a  table  in  the  monitor.   This 
option  should  be  used  whenever  the  header  record  on  the  data  set 
is  not  known  to  have  a  correct  value  for  the  number  of  rows,  and  the 
user  does  not  want  to  execute  a  MATRIX  MOVE  to  count  the  rows.   To 
use  the  option,  punch  a  card  with  #OLD  in  the  first  four  columns. 
Then  code  the  address  and  the  number  of  rows  in  the  usual  SOUPAC 
fashion.   The  niomber  of  columns  may  be  coded  on  the  card  if  desired, 
but  will  be  totally  ignored.   Include  this  card  in  the  prolog  section 
of  the  SOUPAC  job. 

For  example,  to  indicate  that  a  data  set  to  be  input  from 
SEQUENTIAL  1  has  77  rows  you  would  prepare  the  following  card: 

#OLD  (SI)  (77). 
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k.       /i^TEST  OPTION 

There  is  also  available  a  #TEST  option;  however,  this  facility 
is  complicated  and  intended  for  testing  purposes  within  the  SOUPAC 
office  and  has  no  significant  advantage  for  the  general  user. 

5.   /S^DEFINE  OPTION 

Whenever  the  user  wishes  to  specify  the  dimensions  of  a  direct  access 
data  set  (DISK  address),  punch  ^DEFINE  in  the  first  seven  columais  of 
a  card  followed  by  the  address,  number  of  rows  and  number  of  columns 
coded  in  the  usual  SOUPAC  fashion.   Include  this  card  in  the  prolog 
section  of  your  program.   For  double  precision  matrices,  code  the  sajne 
number  of  rows,  but  twice  as  many  columns  as  otherwise.   DISK  1  and  DISK  2 
have  default  definitions  of  k^O   rows  by  U50  columns  single  precision. 
If  the  user  desires  any  other  dimensions  on  these  data  sets,  ^DEFINE  must 
be  used.   If  the  user  desires  to  use  any  DISK  address  other  than  DISK  1 
and  DISK  2,  #DEFINE  must  be  used  besides  supplying  the  necessary  DD 
cards . 

For  example,  to  define  a  data  set  for  DISK  IT  with  20  rows  and  UO 
columns  double  precision,  you  would  prepare  the  following  card: 

#DEFINE  (DISK  17)(20)(80). 

Notice  that  all  prolog  cards  start  with  a  #  in  column  one  and  must  occur 
before  any  SOUPAC  program  parameter  cards.   A  #-card  in  the  middle  of 
the  SOUPAC  program  parameter  deck  is  treated  as  a  comment.   There  is, 
however,  a  ^-control  card,  while  not  strictly  a  prolog  card,  which  may 
occur  in  the  SOUPAC  program  parameter  deck  and  will  not  be  treated  as 
a  comment.   This  is  the  #-zero  card  and  is  the  only  exception  to  the 
statement  about  #  cards  being  comments  if  in  the  middle  of  the  deck. 
The  #-zero  card  is  essentially  a  debugging  tool  to  facilitate  reading 
of  dumps  if  one  is  needed.   It  has  no  particular  use  for  the  user. 


C.   ^-CONTROL  CARDS 

$-CONTROL  CARDS  are  used  to  provide  additional  information 
to  a  SOUPAC  program  above  and  beyond  what  is  included  in  the  parameters. 
There  are  3  $-control  cards.   All  must  begin  in  column  one  with  the  character 
$  and  then  continue  accross  the  card  without  blank  columns. 

1.   $C-B 

The  $C-B  card  provides  as  its  arguments  the  variables  to 
be  used  as  control  breaks  for  a  program  which  accept  control 
breaks.   The  use  of  this  card  with  a  program  which  does 
not  accept  control  breaks  is  an  error.   The  form  of  this  card 
is  as  follows: 

$C-B(Vi)(V2) (Vn). 

When  V-|_  through  Vj^  are  variable  numbers  and  n  must  be  less  than  or 
equal  to  2k. 
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2.  $INP 

$INP  has  as  its  arguments  a  string  of  input  addresses. 
The  form  is: 

$INP(Ai) (A  ). 

where  A-^  through  Aj^  are  input  addresses  including  cards. 
The  number  of  addresses  will  be  determined  by  the  program 
accepting  the  $INP  card  and  will  explicitly  mentioned  in  the 
program  write-up. 

3 .  $OUT 

$OUT(A.  ) (A  ). 

$OUT  has  as  its  arguments  a  string  of  output  addresses.   The 
form  is  the  same  as  that  for  $INP  and  the  number  of  addresses 
is  also  determined  by  the  program  accepting  the  $OUT  card. 
M\iltiple  output  address  will  be  accepted.   See  section  on 
Input /Output  multiple  addresses. 
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SOUPAC  INPUT-OUTPUT  AND  TEMPORARY  STORAGE 

I .  GENERAL 

A.  Input  and  Output  as  Data  Types 

Consider  a  set  of  data  which  a  researcher  wants  intercorrelated. 
To  do  correlations  there  is  in  the  SOUPAC  library  of  statistical  precedures 
a  correlation  program.   Input  to  the  correlation  program  is  the  researcher's 
raw  data;  output  from  the  correlation  program  is  a  matrix  of  correlation 
coefficients.  Similarly,  every  conceivable  program  has  a  particular  input; 
in  fact,  perhaps  several  inputs,  and  some  output. 

The  nature  of  the  input  and  output  of  a  particular  program  will 
depend  on  the  program  and  its  intent.   For  example,  raw  data  variables 
are  input  into  a  correlation  program  which  outputs  a  correlation  matrix. 
But  a  factor  analysis  program  expects  as  input  a  correlation  matrix, 
and  yields  as  output  a  factor  matrix.   In  contrast  to  the  singular 
relation  of  the  nature  of  input  and  output  to  a  particular  statistical 
program,  every  program  finds  its  input  somewhere  and  must  put  its  output 
somewhere. 

B.  Input  and  Output  as  Data  Sources 

SOUPAC  is  designed  in  such  a  manner  that  the  researcher  can  tell 
any  program  where  his  inputs  are  and  where  to  put  his  outputs.   Punched  cards 
are  an  obvious  input  source;  printed  pages  are  an  obvious  output  source. 
But  the  nature  of  a  punched  card  deck  input  into  a  correlation  program  would 
be  that  of  raw  data  variables.   In  the  SOUPAC  system  input  and  output  sources 
are  also  called  addresses.   Thus,  a  possible  input  address  for  a  correlation 
program  is  cards  and  a  possible  output  address  for  correlation  coefficients 
is  print.   Input  and  output  addresses  are  parameters  to  every  program  in 
the  SOUPAC  system.   As  the  researcher  reads  a  particiilar  program  write-up 
he  will  notice  that  the  order  of  the  parameters  determines  the  nature  of 
his  input  or  output  and  his  supplying  an  input  or  output  address  determines 
whether  or  not  he  uses  or  gets  the  particular  inputs  and  outputs. 

II.  ELEMENTARY  INPUT/OUTPUT  ADDRESS  AND  TEMPORARY  STORAGE 

A.   Possible  elementary  input  and  output  addresses  in  the  SOUPAC 
system  are  these: 

INPUT:   CARDS,  SEQUENTIAL  1,  SEQUENTIAL  2,  .  .  . 
.  .  .  SEQUENTIAL  15 


OUTPUT:   PRINT,  SEQUENTIAL  1,  SEQUENTIAL  2, 
.  .SEQUENTIAL  15  (See  section  on  pi 


punched  cards ) 


Again,  CARDS  and  PRINT  are  obvious  sources.   SEQUENTIAL  1  through 
SEQUENTIAL  15,  however,  are  input  or  output  names  of  15  temporary  storage 
regions  available  to  the  researcher  in  the  SOUPAC  system.   These  15 
temporary  storage  regions  are  provided  for  exactly  that  purpose,  temporary 
storage  of  data.   Notice  that  with  this  facility  a  user  can  save  his 
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correlation  matrix,  for  example,  at  SEQUENTIAL  1  and  then  give  SEQUENTIAL 
1  as  an  input  address  to  a  factor  analysis  program.   Or  a  researcher  can 
construct  a  copy  of  his  data  on  temporary  storage  and  then  let  any  number 
of  programs  use  the  same  data  as  input  from  the  same  input  address ,  saving 
hiin  the  effort  of  making  multiple  copies  of  his  card  deck  so  that  each 
program  would  read  its  ovm  deck.   Finally,  temporary  storage  addresses 
enable  the  saving  of  intermediate  results  for  fiirther  processing  or 
modification  by  other  programs  and  thereby  enable  the  researcher  to 
construct  his  own  analysis  procedure  by  providing  the  appropriate  inputs 
and  outputs  to  the  right  programs  at  the  right  times. 

B.   SOUP  vs  SOUPAC  with  Respect  to  Temporary  Storage 

There  are  two  ways  of  invoking  the  SOUPAC  system.   One  can  ask 
for  SOUPAC  or  SOUP.   Note  that  all  15  temporary  storage  regions  are  allocated 
to  SOUPAC,  while  only  SEQUENTIAL  1  through  SEQUENTIAL  5  are  allocated  to 
SOUP.   Asking  for  SEQUENTIAL  6  through  SEQUENTIAL  15  vhen  running  under 
SOUP  will  cause  an  error  and  terminate  the  job. 

All  of  these  input-output  addresses  may  be  abbreviated  as  follows: 


CARDS 
PRINT 
SEQUENTIAL  1 


C 
P 
SI  (or  Tl) 


SEQUENTIAL  15 


S15  (or  T15) 


Tl  through  T15  are  alternative  abbreviations  for  SEQUENTIAL  1  through 
SEQUENTIAL  15-   Tl  through  T15  are,  in  fact,  abbreviations  of  TAPE  1 
through  TAPE  15-   SEQUENTIAL  1  through  SEQUENTIAL  15  and  their  abbreviations 
are  the  recommended  uses.   The  Tl  through  T15  notation  reflects  a  real 
technical  distinction  but  has  been  kept  to  enable  programs  using  that  notation 
to  run. 


C.   Multiple  Output  Addresses 

A  researcher  may  want  to  output  to  several  sources :   he  may  desire 
to  both  print  and  save  some  results  for  later  use.   He  cannot,  however, 
input  from  more  than  one  source  for  a  particular  input  address.   The  facility 
of  multiple  output  addresses  has  the  following  construction: 

(output  addres si /output  address^/output  address^). 

This  is  the  completely  general  form  providing  for  up  to  three  separate 
outputs.   Each  output  must  be  a  different  source,  however.   Thus,  (Sl/P/X) 
is  a  valid  multiple  output  address  providing  for  temporary  storage  at  SI, 
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a  print  of  the  same  data,  and  a  punched  copy  of  the  data.   (See  section  on 
punch  for  explanation  of  X).   (Sl/P)  will  print  and  store  but  not  punch. 
The  order  of  the  addresses  makes  no  difference.   (Sl/P)  is  equivalent  to 
(P/Sl).   Forms  such  as  (S1/S2),  however,  are  not  permitted,  nor  are  (P/P) 
or  (X/X):  one  can  output  only  to  one  sequential  and  only  once  to  P  or  X. 

The  above  general  form  is  available  only  if  the  output  address  in 
the  particular  program  is  marked  with  an  fi. 

In  all  cases,  however,  the  form 

(output  address  l/output  address  2) 
is  valid  unless  the  program  write-up  explicitly  has  a  restriction. 
D.   Print  is  F  Form  of  Output  Print  Address. 

There  is  yet  another  form  to  output  addresses.   This  form  is 


available  only  where  the  researcher  finds  the  symbol  Q   in  the  program  write- 
up  and  has  to  do  with  the  kind  of  printed  output.   For  technical  reasons, 
most  programs  print  in  a  form  called  E-format  which  is  a  form  of 
scientific  notation.   This  form  allows  the  computer  to  print  numbers  of 
any  size.   Some  programs,  for  which  the  output  numbers  are  known  to  be 
constrained,  as  in  correlation  coefficients,  however,  print  in  a  form 
called  F- format  which  is  ordinary  decimal  number  representation.   F-format 
generally  cannot  print  numbers  larger  than  a  pre-determined  size.   The 
size  of  number  depends  on  the  nature  of  a  researcher's  data,  but  the  program 
has  no  way  of  knowing  this,  hence,  the  most  general  fonn,  E-format  is  used. 

The  researcher  however,  can  on  option  specify  F-format.   To  print  in 
F-format  he  would  use  the  following  output  address: 

(P(F))  or  (P(F)/S1)  if  he  wanted  a  multiple 

output  address.  Those  programs  which  print  in  F-format  already,  as  for 
correlation  coefficients,  can  be  made  to  print  in  E-format  by  using  the 
following  output  address: 

(P(E))  or  (P(E)/S15). 


The  different  forms  look  like  this 


E-format 


+  O.I23U5E  06 


Scientific  Notation 


I.23U5  X  105 


F-format 


Decimal  Number 
Representation 


+  123^56.123^5    +123^56.123^+5 


All  four  numbers  have  the  same  value  correct  to  5  places.   Notice  that  F-format 
cannot  represent  a  number  greater  than  999999-99999  in  absolute  value 
whereas  E-format  can  represent  the  first  5  digits  of  any  number  of  order  of 
magnitude  up  to  1099  .   The  numbers  of  digits  illustrated  for  E  and  F 
formats  are  the  pre-determined  limits  for  the  size  of  numbers.   E-format  is 
the  more  general  form  but  F-format  is  easier  to  read. 
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In  this  example  of  E-format  the  E  02  part  is  to  be  understood  as 
10  .   E  03  would  be  103  and  E-OU  would  be  10"^.   Thus,  .376  E  03  is 
.376  X  103  or  376.  while  .129HE-OI  is  .129U  x  10"^  or  .0129^.   The  sign 
following  the  E  determines  which  way  to  move  the  decimal  point;  left  for 
negative,  right  for  blank  or  positive.   The  number  following  the  sign  or 
blank  determines  how  many  places  to  move  the  decimal  point. 


E. 


Punched  Output   (Don't  forget  to  specify  CARDS=  on  ID  Card!) 


All  programs  which  have  output  addresses  marked  with  the  symbol 
Q,   can  punch  output  directly  by  using  the  X  output  address.   X  is  the 
abbreviation  for  cards  as  output.   C  used  as  an  abbreviation  for  an  output 
address  will  be  an  error.   Punched  output  generated  by  the  use  of  the 
X  output  address  will  be  in  E-format.   (See  section  above).   X(F)  is  not 
a  valid  form  and  will  be  an  error. 

If  punched  output  is  desired  in  a  form  other  than  E-format  or 
from  a  program  which  does  not  allow  the  X  output  address,  then  the 
researcher  must  make  a  copy  of  his  data  on  temporary  storage  and  go  to 
the  MATRIX  program  and  use  the  PUNCH  instruction  provided  in  that 
program. 

F.  Obtaining  Additional  Input /Output  Sources 

It  happens  that  15  temporary  storage  locations  may  not  be 
enough.   Additional  tenrnorary  storage  may  be  obtained  by  calling 
for  SI6  through  SUO  .   Use  of  SI6  through  S^+O  x-equires  the  addition  of  Job 
Control  Cards  to  the  36O  system  cards  of  the  SOUPAC  program  deck.   At 
least  the  first  time  the  researcher  should  check  with  SOUPAC  consultants 
before  doing  this;  firstly  to  learn  to  do  it  correctly  if  he  doesn't  know  how 
already,  and  secondly,  if  he  knows  how,  to  make  sure  none  of  the  Job  Control 
Language  has  been  changed  or  modified,  which  can  happen  due  to  360  system 
changes  or  reconfigurations,  or  SOUPAC  system  changes,  which  may  not  be 
announce^  in  contrast  to  SOUPAC  program  changes  whicii  woiild  have  been 
announced. 

If  in  special  instances  even  ^0  temporary  storage  regions  are  not 
sufficient  or  a  situation  arises  where  so-called  DISK  temporary  storage  is 
required,  there  can  be  made  available  temporary  storage  regions  called 
DISKl  through  DISKUO.   Check  with  the  SOUPAC  consultants  before  using 
these  for  the  proper  Job  Control  Cards  and  the  proper  SOUPAC  prolog  cards. 

G.  Using  Owner  Data  Sources  or  Special  Input/Output  Requirements 
in  the  SOUPAC  system 


Users'  own  tapes  or  disk  packs  can  be  used  with  the  SOUPAC 
system  for  input  or  output. 


-5B- 


Special  input/output  requirements  can  usually  be  handled  -the 
COUPAC  system  provided  the  requirements  can  be  handled  by  the  360  system 
at  all.   In  such  cases  check  with  the  SOUPAC  consultants. 

General  problem  types  of  the  nature  alluded  to  above  would  be 
multiple  me  volumes,  blocked  input/output,  formatted  or  unformatted 
i  pui/oStput,  different  kinds  of  record  lengths  and  ^j^f ^^^^^ Jf^,:  ^^^^ 
representation  due  to  machine  differences  or  differences  m  facilitie. 
at  other  computer  installations. 
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APPENDIX    C 


IC 
SOUPAC  Glossary  of  Terms  on  Data  Representation 

BIT-  a  "binary  digit;  e.g.  a  0  or  1.   a  BIT  has  two  states. 

BYTE-   (also  called  a  CHARACTER  )~8  bits.   A  BYTE  (CHARACTER)  has  2^^  (256)  states, 

CHARACTER-   (see   BYTE). 


,8 


CONVERSION-  the  process  of  going  from  one  of  the  three  DATA  TYPES  to  another. 
For  example,  the  number  6.25  would  be  represented 
in  CHARACTER  mode  as: 

11110000  01001011  11110010  11110101 

in  FLOATING  POINT  mode  as: 

01000001  01100100  00000000  00000000 

and  in  FIXED  POINT  mode  as: 

00000000  00000000  00000000  00000110 

Notice  that  in  the  FIXED  POINT  representation,  the  fractional  part 
has  been  lost. 

DATA  TYPE-  method  or  mode  of  representing  information.   There  are  three 
essential  DATA  TYPES. 


CHARACTER  mode 
FLOATING  POINT  mode 
FIXED  POINT  mode 


Notice  that  a  single  bit  pattern  has  different  meanings  when  interpreted 
under  each  data  type. 

For  example,  the  SINGLE  WORD 

11010111  11000001  11100100  11010011 

in  CHARACTER  mode  means: 
PAUL 

in  FLOATING  POINT  mode  means: 
-.3T50U52^0010TUT5  *  102o 
_3T50li52U0010T^75.  *  10^2 

and  in  FIXED  POINT  mode  means: 
-675158829 

The  radix  point  is  essential  to  FLOATING  POINT  representation  and  does 
not  exist  in  FIXED  POINT  representation. 

DOUBLE  PRECISION-  a  data  attribute  similar  to  LENGTH  ATTRIBUTE  which 

specifically  indicates  that  the  data  item  is  a  DOUBLE  WORD  in  length. 
A  DOUBLE  PRECISION  FLOATING  NUMBER  has  6h   bits 


a  sign  bit 

a  7  bit  exponent 

a  56  bit  magnitude  2i  16.8  decimal  digits 
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DOUBLE  WORD-  two  WORDS-  also  6k   BITS;  also  8  BYTES  (see  WORD).   A 
DOUBLE  WORD  has  2^^  states. 

FIXED  POINT  NUMBER-   (also  INTEGER)  a  32  bit  data  item  which  takes  on 
only  integer  values  from 

(-2^^)   to   (231  _  1) 

or  equivalently 

-21U7U836U8  to  21U7U836UT 

The  range  of  FIXED  POINT  NUMBERS  can  be  represented  by 


-(2^^) 


-3   -2  -1 


231-1 


FLOATING  POINT  NUMBER-  a  niomber  which  is  to  be  internally  represented  in 
a  manner  similar  to  socalled  "scientific  notation."  FLOATING  POINT 
NUMBERS  are  represented  as 

S.M  *  16^ 

where 

S  is  the  sign  +  or  -. 

.  is  the  base  I6  radix  point 

M  is  the  magnitude,  where  0<_M<1. 

In  SINGLE  PRECISION,  M  is  2ii  bits  long. 

In  DOUBLE  PRECISION,  M  is  56  bits  long. 
*  is  the  symbol  for  ordinary  multiplication. 
E  is  a  7  hit  exponent. 

The  range  of  FLOATING  POINT  NUMBERS  available  for  both  SINGLE  PRECISION 
and  DOUBLE  PRECISION  can  be  approximately  represented  by. 


f  =  -7.2*10 


75 


-5.i+*10  '^  =  t  i  =  5.^*10  '^^ 


75 
7. 2*10 '^  =  f 


Notice  that  the  precision  does  not  affect  the  overall  range  of  values 
available.   The  precision  only  indicates  the  number  of  values  which 
can  be  represented  exactly  within  the  ranges  given. 

INTEGER-   (see  FIXED  POINT  NUI4BER) 

LENGTH  ATTRIBUTE-  the  nimber  of  BYTES  in  a  data  element.   For  example,  a 

DOUBLE  WORD  has  a  LENGTH  ATTRIBUTE  of  8.   A  BYTE  has  a  LENGTH  ATTRIBUTE 
of  1. 
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REAL  NUMBER-   (see  FLOATING  POINT  NUMBER) 

SINGLE  PRECISION-  a  data  attribute  similar  to  LENGTH  ATTRIBUTE  which 

specifically  indicates  that  the  data  item  is  one  WORD  long.   A  SINGLE 
PRECISION  FLOATING  POINT  NUMBER  has  32  bits. 

a  sign  bit 

a  7  tiit  exponent 

a  2U  bit  magnitude  2l  7.2  decimal  digits. 

SINGLE  WORD-   (see  WORD) 

WORD-   (also  SINGLE  WORD)— 32  bits;  also  h   bytes.   A  WORD  has  2^2  (I129I496T296; 
states . 

Example:   01000001  00010000  00000000  00000000 
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Densities  of  Some  Coiimion  Probability  Distributions 


A.   Definitions 

f(x)  will  be  used,  in  what  follows,  to  denote  the  probability  density 
function  (p.d.f . ) 

N(lJ,a  )  denotes  the  normal  distribution  having  mean  y,  and  variance  a 
The  p.d.f.  is  ,p;iven  by: 


f(x)  = 


:p 

■-J  for  -oo  <  X  <  °° 


"^2-^0^ 


here  we  may  have  -oo  <  "u  <  °o  and  0  <  a  <  °o 

X  (n)  denotes  the  chi-square  distribution  having  n  degrees  of  freedom. 
The  p.d.f.  is  given  by: 


f(x) 


1 


r(n/2)2 
=  0 


n/2 


n/2  -  1    -x/2    ^  ^ 
X        e        0  <  X  < 


otherwise 


here  n  is  a  positive  integer,  and  T   denotes  the  well  known  "gamma  function," 
which  is  defined  by 


1  (a)  =  J      V   e  dy 
o 


a  >  0, 


V {a)    =   a!  when  ot  is  integer  valued  (i.e.  a  positive  integer),  and  we 
usually  define  r(0)  =  1. 

Y(a,A)  or  G(a,A)  denotes  the  gamma  distribution,  with  parameters  a 
(degrees  of  freedom),  and  A.   The  gaimna  p.d.f.  is  given  by: 


f  X. 


r(a) 

0 


A    ,,  stt-l  -Ax 
( Ax )    e 


0  <  X  <  °° 

otherwise . 


Here  v/e  require  A  >  0,  and  a  >  0.   Note  that  with  A  =  1/2,  and  integral  a. 
we  have  x  (2a),  while  r(l,A)  is  usually  called  the  "negative  exponential 
distribution"  with  pMvameter  A. 


'mmm 


X'^.-l 
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t(n)  denotes  the  t-distribution  with  n  degrees  of  freedom.   It's  p.d.f. 


IS 


f(x)  = 


r(n/2)  +1/2) 


oo  <  X  <  °°, 


r(n/2)  s/nnd+xS/n)^""^ 

where  n  is  a  positive  integer. 

F(n  ,n_)  denotes  the  F-distrihution  with  n  degrees  of  freedom  "in  the 
numerator"  and  n  degrees  of  freedom  "in  the  denominator."  The  p.d.f.  is: 


f(x)  =  r(m/2  +  np/2)(nn/np)^l/^  x"l^^  "  ^ 


r(n^/2)  r(n2/2)  (l  +  n^^/n^)^\^''2^ '^ 


=   0 


0  <  X  <  °° 


otherwise 


where  n-,  ,  n2  are  positive  integers.   The  F-distribution  arises  in  practice 
from  the  quotient  (x  /n  )/(x  /n  ),  where  x  is  x^(n  )  and  x  is  x2(n2),  hence 
the  terminology  "degrees  of  freedom  in  the  numerator." 

Beta(p,q)  denotes  the  beta  distribution  with  parameters  p  and  q.   It's 
p.d.f.  is 


f(x)  -  r(p+q)  xP-'(i-x)^-^ 
^^""^   r(p)r(q) 


0  <  x  <  1 


=  0 


otherwise 


Be(l,l)  is  constant  on  the  interval  (O,  1/2)  and  is  commonly  called  the 

rectangular,  or  uniform  distribution  on  (0,1/2) 


Cauchy(t)  denotes  the  cauchy  distribution  with  parameter  t.   The  p.d.f. 


IS  : 


f(x)  = 


+  2^2 
t   +  x 


_oo  <  X  *^  °° 


where  t  is  a  positive  scale  parameter.   The  graph  of  this  distribution 
resembles  that  of  a  Normal  distribution,  but  the  Cauchy  distribution  behaves 
more  pathologically. 

Relations  Among  Distributions 

We  will  need  to  introduce  some  notation: 

O/         means  "distributed  as",  or  "has  the  distribution." 

I.I.D.     is  an  abbreviation  for  "independently  and  identically  dis- 
tributed." 


will  be  used  to  denote  a  random  variable,  and  sometimes  will 

be  indexed  as  X. . 

1 


{X.} 


i  1=1 
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denotes  a  finite  sequence  of  random  variables,  or  more  informally, 
an  ordered  (finite)  collection  of  random  variables. 

reads  as  "is  equivalent  to  the  following". 


F(c)=P[x^'C]   is  the  probability  that  the  random  variable  x  is  less  than  C 
F(c)  is  called  the  cumulative  distribution  function. 


6 

T 

8 

9 

10 

11 

12 
13 

li+, 
15. 


X  ^  N(y,a")  <;=^    {X-\i)/a   %  n(0,1). 

If  X  '^^  n(0,1),  then  X"  ^  x'^d)- 

If  {X.  } ."   are  IID  X^(l)  .  and  Y  =^E^  X.  ,  then  Y  "^   x"(n)  • 

If  X  '^^  ^l(0,l),  Y  -^^  X  (n),  and  if  A,  Y  are  independent,  and  Z  =  X/ZY/n  , 
then  Z  '^  t(n). 

2  \^\ 

If  X^  ^  X"(n.  ),  and  X^  'v.  x  (n„ )  ,  and  Y  =  ^      ,  then  Y  -^   F(n  ,n^). 
1       i        ^       li  A/n  1^ 

If  X  '^   t(n),  then  Y  =  X"  satisfies  Y  %  F(l,n). 

If  X^  ,  X_  are  IID  N(0,l),  then  Y  =  X  /X   satisfies  Y  %  Cauchy(l). 

±  c-  J-         c. 

If  X  is  t(l),  then  X  is  Cauchy(l). 

n   n 
If  X  -^  u(-  -,  -  )  and  Y  =  tan  X,  then  Y  is  Cauchy(l). 

X  ^  Y(A,n)  <^:^  AX  ^'  Y(l,n). 


X  '^  X  (n) 


X  '^-  y(1/2,  n/2; 


If  X  '^j  y(A,  l)  then  X  has  what  is  commonly  called  the  "negative  exponential 

distribution"  with  parameter  A  (denoted  exp  A), 

n 

If  {X.}.",   are  IID  exp  A,  then  .Z,  X.  "^   Y(A,n). 
1  1=1  "^         1=1   1 


If  -  y  ^  F(n,n),  then  — -r  '^  Beta(m/2,  n/2' 
m  L+A 


Ud 


APPLICATION  OF  ABOVE  TO  RND  PROGRAM 


With  the  modest  selection  of  distributions  provided  and  equipped  vith 
knowledge  of  the  effect  transformations  have  on  a  distribution  (the  above  list 
provides  a  good  start),  many  additional  distributions  may  be  obtained. 

EXAMPLE  1 .   (Inverse  Function  method).   Given  an  arbitrary  continuous 
cumulative  distribution  function,  F(x)  and  its  inverse,  F~-'-(y)  applied  to  a 

sequence  {X.}"^  "^   u(0,l)  yields  a  sequence  {X.}  ^_^    '^/  F(x). 

1  i=i  1   1-1 

1  2 

EXAMPLE  2.   Knowing  that  y('5"5  n)  is  equivalent  to  x  (2n),  and  using 

no.  12  of  the  above  list  of  relations,  we  may  generate  a  first  matrix  of  x  (n) 

random  numbers,  a  second  matrix  of  N(0,l)  random  numbers,  and  by  element  wise 

dividing  the  second  by  the  first  obtain  a  matrix  of  t(2n)  random  numbers. 
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Other  Programs 

In  addition  to  the  statistical  programs  in  the  SOUPAC  system,  the  SOUPAC 
group  maintains  on  OS/360  disk  file  a  library  of  other  programs  which  are  briefly 
described  below.   Details  about  these  programs  and  the  means  of  accessing  them 
are  available  at  the  SOUPAC  Office. 


PTSVIEW  (Points  of  View  Analysis) 

This  program  performs  a  factor  analytic  points  of  view  analysis  following 
a  procedure  developed  by  Tucker  and  Messick  (1963).   At  the  specification  of  the 
user,  the  program  computes  either  l)  cross-products;  2)  covariances ;  or  3) 
correlations  from  the  raw  data  (usually  judgments).   The  analysis  is  then  per- 
formed on  this  product  matrix. 

The  program  may  be  used  for  a  second  "pass"  with  both  the  original  data 
matrix  and  a  second  "hypothetical  subjects  matrix"  (containing  coordinates  of  the 
idealized  individuals)  as  input.   The  result  is  a  matrix  of  hypothetical  judg- 
ments, one  set  for  each  idealized  individual;  these  are  the  judgments  that,  under 
the  model,  would  have  been  made  by  each  of  the  idealized  individuals. 

T0RSCA  (rionmetric  Multidimensional  Scaling) 

This  program  performs  nonmetric  multidimensional  scaling.   The  program 
computes  a  geometric  representation  of  a  data  matrix  such  that  the  distances 
between  the  points  in  the  representation  best  reproduce  the  order  of  the  entries 
in  the  data  matrix.   The  geometric  representation  may  be  in  any  Minkowski  space 
(including  city-block  space  and  Euclidean  space),  and  the  order  being  reproduced 
may  be  the  inverse  of  the  order  of  the  entries  in  the  data  matrix.   Finally,  the 
data  matrix  may  be  either  a  rectangular  matrix  or  a  symmetric  matrix.   In  the 
former  case,  it  is  assumed  that  the  space  to  be  derived  is  a  joint  space  of  both 
row  and  column  variables. 
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TP0LY  (Least  Squares  Polynomial  Fit) 

Instead  of  fitting  a  polynomial  in  standard  form  to  a  set  of  data  points  by 
least  squares,  TP0LY  fits  a  linear  combination  of  Chebychev  polynomials.   The 
resulting  normal  equations  are  then  solved  by  Cholesky's  method.   Both  coefficients 
of  the  linear  combination  of  Chebychev  polynomials  and  the  coefficients  of  the 
equivalent  polynomial  in  standard  form  are  calculated  and  printed. 

This  method  avoids  inverting  a  possibly  ill-conditioned  matrix  and  the 
round-off  error  properties  are  excellent.  However,  the  variance-covariance 
estimates  of  the  coefficients  are  lost. 

UMAVAC  (Univariate  and  Multivariate  Analysis  of  Variance  and  Covariance) 

UMAVAC  performs  univariate  and  multivariate  linear  estimation  and  tests  of 
hypotheses  for  any  crossed  and/or  nested  design,  with  or  without  concommintant 
variables.   The  number  of  observations  may  be  equal,  proportional  or  dispropor- 
tionate, the  latter  including  missing  observations  and  incomplete  designs. 

Among  the  possible  analyses  which  can  be  performed  by  this  program  are 
regression  analysis,  including  canonical  correlation  analysis  and  step-wise 
regression  analysis,  analysis  of  variance,  and  discriminant  analysis. 
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